[dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model

2016-11-25 Thread Richardson, Bruce


> -Original Message-
> From: Van Haaren, Harry
> Sent: Friday, November 25, 2016 11:59 AM
> To: Jerin Jacob ; Thomas Monjalon
> 
> Cc: dev at dpdk.org; Richardson, Bruce ;
> hemant.agrawal at nxp.com; Eads, Gage 
> Subject: RE: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven
> programming model
> 
> Hi All,
> 
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Friday, November 25, 2016 12:24 AM
> > To: Thomas Monjalon 
> > Cc: dev at dpdk.org; Richardson, Bruce ; Van
> > Haaren, Harry ; hemant.agrawal at nxp.com;
> > Eads, Gage 
> > Subject: Re: [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven
> > programming model
> >
> > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > 2016-11-24 07:29, Jerin Jacob:
> > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > 2016-11-18 11:14, Jerin Jacob:
>
> > > > > > +#define RTE_EVENT_TYPE_ETHDEV   0x0
> > > > > > +/**< The event generated from ethdev subsystem */
> > > > > > +#define RTE_EVENT_TYPE_CRYPTODEV0x1
> > > > > > +/**< The event generated from crypodev subsystem */
> > > > > > +#define RTE_EVENT_TYPE_TIMERDEV 0x2
> > > > > > +/**< The event generated from timerdev subsystem */
> > > > > > +#define RTE_EVENT_TYPE_CORE 0x3
> > > > > > +/**< The event generated from core.
> > > > >
> > > > > What is core?
> > > >
> > > > The event are generated by lcore for pipeling. Any suggestion for
> > > > better name? lcore?
> > >
> > > What about CPU or SW?
> >
> > No strong opinion here. I will go with CPU then
> 
> 
> +1 for CPU (as SW is the software PMD name).
> 

Fine, I'm outvoted. I'll learn to live with it. :-)

/Bruce


[dpdk-dev] [RFC PATCH 0/7] RFC: EventDev Software PMD

2016-11-22 Thread Richardson, Bruce


> -Original Message-
> From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> Sent: Monday, November 21, 2016 8:19 PM
> To: Richardson, Bruce 
> Cc: Van Haaren, Harry ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] RFC: EventDev Software PMD
> 
> On Mon, Nov 21, 2016 at 09:48:56AM +, Bruce Richardson wrote:
> > On Sat, Nov 19, 2016 at 03:53:25AM +0530, Jerin Jacob wrote:
> > > On Thu, Nov 17, 2016 at 10:05:07AM +, Bruce Richardson wrote:
> > > > > 2) device stats API can be based on capability, HW
> > > > > implementations may not support all the stats
> > > >
> > > > Yes, this is something we were thinking about. It would be nice if
> > > > we could at least come up with a common set of stats - maybe even
> > > > ones tracked at an eventdev API level, e.g. nb enqueues/dequeues.
> > > > As well as that, we think the idea of an xstats API, like in
> > > > ethdev, might work well. For our software implementation, having
> > > > visibility into the scheduler behaviour can be important, so we'd
> > > > like a way to report out things like internal queue depths etc.
> > > >
> > >
> > > Since these are not very generic hardware, I am not sure how much
> > > sense to have generic stats API. But, Something similar to ethdev's
> > > xstat(any capability based) the scheme works well. Look forward to
> seeing API proposal with common code.
> > >
> > > Jerin
> > >
> > Well, to start off with, some stats that could be tracked at the API
> > level could be common. What about counts of number of enqueues and
> > dequeues?
> >
> > I suppose the other way we can look at this is: once we get a few
> > implementations of the interface, we can look at the provided xstats
> > values from each one, and see if there is anything common between them.
> 
> That makes more sense to me as we don't have proposed counts. I think,
> Then we should not use stats for functional tests as proposed. We could
> verify the functional test by embedding some value in event object on
> enqueue and later check the same on dequeue kind of scheme.
> 
> Jerin
> 

Yes, that can work. Many of the unit tests we are looking at are likely
specific to our software implementation, so we may end up doing a separate
sw-eventdev specific unit test set, as well as a general eventdev set.

/Bruce


[dpdk-dev] [PATCH 7/7] examples/eventdev_pipeline: adding example

2016-11-22 Thread Richardson, Bruce


> -Original Message-
> From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> Sent: Tuesday, November 22, 2016 6:02 AM
> To: Van Haaren, Harry 
> Cc: dev at dpdk.org; Eads, Gage ; Richardson, Bruce
> 
> Subject: Re: [dpdk-dev] [PATCH 7/7] examples/eventdev_pipeline: adding
> example
> 
> On Wed, Nov 16, 2016 at 06:00:07PM +, Harry van Haaren wrote:
> > This patch adds a sample app to the examples/ directory, which can be
> > used as a reference application and for general testing. The
> > application requires two ethdev ports and expects traffic to be
> > flowing. The application must be run with the --vdev flags as follows
> > to indicate to EAL that a virtual eventdev device called "evdev_sw0" is
> available to be used:
> >
> > ./build/eventdev_pipeline --vdev evdev_sw0
> >
> > The general flow of the traffic is as follows:
> >
> > Rx core -> Atomic Queue => 4 worker cores => TX core
> >
> > A scheduler core is required to do the packet scheduling, making this
> > configuration require 7 cores (Rx, Tx, Scheduler, and 4 workers).
> > Finally a master core brings the core count to 8 for this
> > configuration. The
> 
> Thanks for the example application.I will try to share my views on ethdev
> integration and usability perspective. Hope we can converge.

Hi Jerin, 

thanks for the feedback. We'll take it on board for a subsequent version
we produce. Additional comments and queries on your feedback inline below.

/Bruce

> 
> Some of the high level details first before getting into exact details.
> 
> 1) From the HW and ethdev integration perspective, The integrated NIC
> controllers does not need producer core(s) to push the event/packets to
> event queue. So, I was thinking to use 6WIND rte_flow spec to create the
> "ethdev port to event queue wiring" connection by extending the output
> ACTION definition, which specifies event queue its need to enqueued to for
> the given ethdev port (something your are doing in application).
> 
> I guess, the producer part of this example can be created as common code,
> somewhere in rte_flow/ethdev to reuse. We would need this scheme also
> where when we deal with external nics + HW event manager use case
> 
Yes. This is something to consider.

For the pure-software model, we also might want to look at the opposite
approach, where we register an ethdev with the scheduler for it to "pull"
new packets from. This would allow it to bypass the port logic for the new
packets. 

An alternative for this is to extend the schedule API to allow a burst of
packets to be passed in to be scheduled immediately as "NEW" packets. The end
results should be the same, saving cycles by bypassing unneeded processing
for the new packets.

> The complete event driven model can be verified and exercised without
> integrating with eventdev subsystem. So I think, may be we need to focus
> on functional applications without ethdev to verify the eventdev features
> like(automatic multicore scaling,  dynamic load balancing, pipelining,
> packet ingress order maintenance and synchronization services) and then
> integrate with ethdev

Yes, comprehensive unit tests will be needed too. But an example app that
pulls packets from an external NIC I also think is needed to get a feel
for the performance of the scheduler with real traffic.

> 
> > +   const unsigned cores_needed = num_workers +
> > +   /*main*/1 +
> > +   /*sched*/1 +
> > +   /*TX*/1 +
> > +   /*RX*/1;
> > +
> 
> 2) One of the prime aims of the event driven model is to remove the fixed
> function core mappings and enable automatic multicore scaling,  dynamic
> load balancing etc.I will try to use an example in review section to show
> the method for removing "consumer core" in this case.

Yes, I agree, but unfortunately, for some tasks, distributing those tasks
across multiple cores can hurt performance overall do to resource contention.

> 
> > application can be configured for various numbers of flows and worker
> > cores. Run the application with -h for details.
> >
> > Signed-off-by: Gage Eads 
> > Signed-off-by: Bruce Richardson 
> > Signed-off-by: Harry van Haaren 
> > ---
> >  examples/eventdev_pipeline/Makefile |  49 +++
> >  examples/eventdev_pipeline/main.c   | 718
> 
> >  2 files changed, 767 insertions(+)
> >  create mode 100644 examples/eventdev_pipeline/Makefile
> >  create mode 100644 examples/eventdev_pipeline/main.c
> >
> > +static int sched_type = RTE_SCHED_TYPE_ATOMIC;
> 
> RTE_SCHED_TYPE_O

[dpdk-dev] mbuf changes

2016-10-28 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Friday, October 28, 2016 5:50 PM
> To: Morten Br?rup 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] mbuf changes
> 
> On Fri, Oct 28, 2016 at 04:11:45PM +0200, Morten Br?rup wrote:
> > Comments at the end.
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pattan, Reshma
> > > Sent: Friday, October 28, 2016 3:35 PM
> > > To: Olivier Matz
> > > Cc: dev at dpdk.org; Morten Br?rup
> > > Subject: Re: [dpdk-dev] mbuf changes
> > >
> > > Hi Olivier,
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> > > > Sent: Tuesday, October 25, 2016 1:49 PM
> > > > To: Richardson, Bruce ; Morten Br?rup
> > > > 
> > > > Cc: Adrien Mazarguil ; Wiles, Keith
> > > > ; dev at dpdk.org; Oleg Kuporosov
> > > > 
> > > > Subject: Re: [dpdk-dev] mbuf changes
> > > >
> > > >
> > > >
> > > > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Br?rup wrote:
> > > > >> Comments at the end.
> > > > >>
> > > > >> Med venlig hilsen / kind regards
> > > > >> - Morten Br?rup
> > > > >>
> > > > >>> -Original Message-
> > > > >>> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > > > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > > > >>> To: Morten Br?rup
> > > > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev at dpdk.org; Olivier
> > > > >>> Matz; Oleg Kuporosov
> > > > >>> Subject: Re: [dpdk-dev] mbuf changes
> > > > >>>
> > > > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Br?rup wrote:
> > > > >>>> Comments inline.
> > > > >>>>
> > > > >>>>> -Original Message-
> > > > >>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > > >>>>> Richardson
> > > > >>>>> Sent: Tuesday, October 25, 2016 1:14 PM
> > > > >>>>> To: Adrien Mazarguil
> > > > >>>>> Cc: Morten Br?rup; Wiles, Keith; dev at dpdk.org; Olivier Matz;
> > > > >>>>> Oleg Kuporosov
> > > > >>>>> Subject: Re: [dpdk-dev] mbuf changes
> > > > >>>>>
> > > > >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> > > wrote:
> > > > >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup
> wrote:
> > > > >>>>>>> Comments inline.
> > > > >>>>>>>
> > > > >>>>>>> Med venlig hilsen / kind regards
> > > > >>>>>>> - Morten Br?rup
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>> -Original Message-
> > > > >>>>>>>> From: Adrien Mazarguil
> > > > >>>>>>>> [mailto:adrien.mazarguil at 6wind.com]
> > > > >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM
> > > > >>>>>>>> To: Bruce Richardson
> > > > >>>>>>>> Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier
> > > > >>>>>>>> Matz; Oleg Kuporosov
> > > > >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes
> > > > >>>>>>>>
> > > > >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce
> > > > >>>>>>>> Richardson
> > > > >>> wrote:
> > > > >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +, Wiles, Keith
> > > > >>> wrote:
> > > > >>>>>>>> [...]
> > > > >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Br?rup
> > > > >>>>>>>>  wrote:
> > > > >>>>>>>>

[dpdk-dev] KNI discussion in userspace event

2016-10-28 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Friday, October 28, 2016 4:13 PM
> To: Yigit, Ferruh 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] KNI discussion in userspace event
> 
> 2016-10-28 15:31, Ferruh Yigit:
> > * virtio-user + vhost-net
> > This can be valid alternative, removes the out of tree kernel module
> > need. But missing control path. Proof of concept work will be done.
> 
> That's probably a smart alternative for packet injection.
> What do you mean exactly by "missing control path"?

We'll have to see how it performs - which is the key gap for data path that KNI 
fills. Until we get an alternative with (nearly) equivalent performance, there 
will be demand for KNI to stick around.
The "control path" is the ethtool part, to get stats and do operations on the 
NIC using command-line tools.

> 
> > * Remove ethtool support ?
> 
> That's the other part of KNI.
> It works only for e1000/ixgbe. That's a niche.

Yes, it's something we need to remove, but again, we need an alternative first.

> 
> > Still there is some interest, will keep it. But not able to extend it
> > to other drivers with current design.
> 
> It should be removed one day.
> We must seriously think about a generic alternative.
> Either we add DPDK support in ethtool or we create a dpdk-ethtool.
> (or at least a library as the one in examples/).

I don't view that as a great path forward. Sure, we can do our own ethtool, but 
then people will look for ifconfig to work, and "ip" to work, etc. I view 
having a kernel proxy module as the best path here as it is tool agnostic on 
the userspace side, rather than trying to make every tool for working with 
kernel netdevs also have support for dpdk ports.

> Or we do nothing and wait to have more hardware like Mellanox supporting a
> kernel bifurcated driver approach.

Given the lack of other NICs supporting that, I think it could be quite a wait! 
Also, it doesn't work for virtio ports, for pcap ports, or any other ports 
which don't have physical hardware backing them. No reason you shouldn't be 
able to pull stats from all your dpdk ethdevs, not just the ones with physical 
hardware. The same ethdev APIs work for them, so should the same tools.

> 
> > *KNI PMD
> > Patch is in the mail list, missing comments. If it gets some
> > interest/comments/acks it may go in to next release.
> 
> I'm not against KNI PMD but it looks strange to add more support to an old
> dying approach.

I think the main idea here is to clean up the API - at least for the data path. 
There is no reason why we need special KNI RX/TX functions, when ethdev RX/TX 
functions could do the job. However, at a higher level, the more basic 
requirement is that whatever solution for the data-path to kernel from dpdk is, 
it needs to appear as an ethdev, and not as a special library with different 
APIs, as KNI is now.

/Bruce



[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Friday, October 28, 2016 11:29 AM
> To: Thomas Monjalon ; Kulasek, TomaszX
> 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> 
> 
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Friday, October 28, 2016 11:22 AM
> > To: Ananyev, Konstantin ; Kulasek,
> > TomaszX 
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> >
> > 2016-10-28 10:15, Ananyev, Konstantin:
> > > > From: Ananyev, Konstantin
> > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > > --- a/config/common_base
> > > > > > > > > +++ b/config/common_base
> > > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > > >
> > > > > > > > We cannot enable it until it is implemented in every
> drivers.
> > > > > > >
> > > > > > > Not sure why?
> > > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act
> as noop.
> > > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > > >
> > > > > > If it is not implemented, the application must do the
> > > > > > preparation by
> > > > > itself.
> > > > > > From patch 6:
> > > > > > "
> > > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > > application and used Tx preparation API for packet preparation
> > > > > > and verification.
> > > > > > "
> > > > > > So how does it behave with other drivers?
> > > > >
> > > > > Hmm so it seems that we broke testpmd csumonly mode for
> > > > > non-intel drivers..
> > > > > My bad, missed that part completely.
> > > > > Yes, then I suppose for now we'll need to support both (with and
> > > > > without) code paths for testpmd.
> > > > > Probably a new fwd mode or just extra parameter for the existing
> one?
> > > > > Any other suggestions?
> > > > >
> > > >
> > > > I had sent txprep engine in v2
> > > > (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on
> > > > the suggestions. If you like it I can
> > resent
> > > > it in place of csumonly modification.
> > >
> > > I still not sure it is worth to have another version of csum...
> > > Can we introduce a new global variable in testpmd and a new command:
> > > testpmd> csum tx_prep
> > > or so?
> > > Looking at current testpmd patch, I suppose the changes will be
> minimal.
> > > What do you think?
> >
> > No please no!
> > The problem is not in testpmd.
> > The problem is in every applications.
> > Should we prepare the checksums or let tx_prep do it?
> 
> Not sure, I understood you...
> Right now we don't' change other apps.
> They would work as before.
> If people would like to start to use tx_prep in their apps - they are free
> to do that.
> If they like to keep doing that manually - that's fine too.
> From other side we need an ability to test (and demonstrate) that new
> functionality.
> So we do need changes in testpmd.
> Konstantin
> 

Just my 2c on this:
* given this is new functionality, and no apps are currently using it, I'm not 
sure I see the harm in having the function available by default. We just need 
to be clear about the limits of the function and the fact that apps need to do 
work themselves if the driver doesn't provide the function.
* having it enabled will then allow any apps that want to use it do to so.
* however, for our sample apps, and by default in testpmd, we *shouldn't* use 
this functionality, in the absence of any fallback, so that is where I would 
look to have the enable/disable switch, not in the library.
* going forward, I think a SW fallback inside the ethdev API itself would be a 
good addition to make this fully generic.

Hope this helps, [and also that I haven't missed some subtlety in the 
discussion!]

/Bruce


[dpdk-dev] [PATCH] mempool: adjust name string size in related data types

2016-07-20 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> Sent: Wednesday, July 20, 2016 2:37 PM
> To: Zoltan Kiss ; Zoltan Kiss
> ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mempool: adjust name string size in
> related data types
> 
> Hi,
> 
> On 07/20/2016 02:41 PM, Zoltan Kiss wrote:
> >
> >
> > On 19/07/16 17:17, Olivier Matz wrote:
> >> Hi Zoltan,
> >>
> >> On 07/19/2016 05:59 PM, Zoltan Kiss wrote:
> >>>
> >>>
> >>> On 19/07/16 16:37, Olivier Matz wrote:
>  Hi Zoltan,
> 
>  On 07/19/2016 04:37 PM, Zoltan Kiss wrote:
> > A recent fix brought up an issue about the size of the 'name'
> fields:
> >
> > 85cf0079 mem: avoid memzone/mempool/ring name truncation
> >
> > These relations should be observed:
> >
> > RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE -
> > strlen(RTE_RING_MZ_PREFIX) RTE_MEMPOOL_NAMESIZE <=
> > RTE_RING_NAMESIZE -
> > strlen(RTE_MEMPOOL_MZ_PREFIX)
> >
> > Setting all of them to 32 hides this restriction from the
> application.
> > This patch increases the memzone string size to accomodate for
> > these prefixes, and the same happens with the ring name string.
> > The ABI needs to be broken to fix this API issue, this way doesn't
> > break applications previously not failing due to the truncating
> > bug now fixed.
> >
> > Signed-off-by: Zoltan Kiss 
> 
>  I agree it is a problem for an application because it cannot know
>  what is the maximum name length. On the other hand, breaking the
>  ABI for this looks a bit overkill. Maybe we could reduce
>  RTE_MEMPOOL_NAMESIZE and RTE_RING_NAMESIZE instead of increasing
>  RTE_MEMZONE_NAMESIZE? That way, we could keep the ABI as is.
> >>>
> >>> But that would break the ABI too, wouldn't it? Unless you keep the
> >>> array the same size (32 bytes) by using RTE_MEMZONE_NAMESIZE.
> >>
> >> Yes, that was the idea.
> >>
> >>> And even then, the API breaks anyway. There are applications - I
> >>> have at least some - which use all 32 bytes to store the name.
> >>> Decrease that would cause headache to change the naming scheme,
> >>> because it's a 30 character long id, and chopping the last few chars
> >>> would cause name collisions and annoying bugs.
> >>
> >> Before my patch (85cf0079), long names were silently truncated when
> >> mempool created its ring and/or memzones. Now, it returns an error.
> >
> > With 16.04 an application could operate as expected if the first 26
> > character were unique. Your patch revealed the problem that characters
> > after these were left out of the name. Now applications fail where
> > this never been a bug because their naming scheme guarantees the
> > uniqueness on the first 26 chars (or makes it very unlikely) Where the
> > first 26 is not unique, it failed earlier too, because at memzone
> > creation it checks for duplicate names.
> 
> Yes, I understand that there is a behavior change for applications using
> names larger than 26 between 16.04 and 16.07. I also understand that there
> is no way for an application to know what is the maximum usable size for a
> mempool or a ring.
> 
> 
> >> I'm not getting why changing the struct to something like below would
> >> break the API, since it would already return an error today.
> >>
> >>#define RTE_MEMPOOL_NAMESIZE \
> >
> > Wait, this would mean applications need to recompile to use the
> > smaller value. AFAIK that's an ABI break too, right? At the moment I
> > don't see a way to fix this without breaking the ABI
> 
> With this modification, if you don't recompile the application, its
> behavior will still be the same as today -> it will return ENAMETOOLONG.
> If you recompile it, the application will be aware of the maximum length.
> To me, it seems to be a acceptable compromise for this release.
> 
> The patch you're proposing also changes the ABI of librte_ring and
> librte_eal, which cannot be accepted for the release.
> 
> 
> >
> >>(RTE_MEMZONE_NAMESIZE - sizeof(pool_prefix) - sizeof(ring
> prefix))
> >>struct rte_mempool {
> >>union {
> >>  char name[RTE_MEMPOOL_NAMESIZE];
> >>  char pad[32];
> >>};
> >>...
> >>}
> >>
> >> Anyway, it may not be the proper solution since it supposes that a
> >> mempool includes a ring based on a memzone, which is not always true
> >> now with mempool handlers.
> >
> > Oh, as we dug deeper it gets better!
> > Indeed, we don't necessarily have this ring + memzone pair underneath,
> > but the user is not aware of that, and I think we should keep it that
> > way. It should only care that the string passed shouldn't be bigger
> > than a certain amount.
> 
> Yes. What I'm just saying here is that it's not a good solution to write
> in the #define that "a mempool is based on a ring + a memzone", because if
> some someone adds a new mempool handler replacing the ring, and also
> creating a memzone 

[dpdk-dev] [PATCH] doc: announce ABI change for mbuf structure

2016-07-19 Thread Richardson, Bruce


> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, July 19, 2016 4:04 PM
> To: Richardson, Bruce 
> Cc: dev at dpdk.org; jerin.jacob at caviumnetworks.com;
> thomas.monjalon at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for mbuf
> structure
> 
> Hi Bruce,
> 
> On 07/19/2016 04:40 PM, Bruce Richardson wrote:
> > On Tue, Jul 19, 2016 at 04:01:15PM +0200, Olivier Matz wrote:
> >> For 16.11, the mbuf structure will be modified implying ABI breakage.
> >> Some discussions already took place here:
> >> http://www.dpdk.org/dev/patchwork/patch/12878/
> >>
> >> Signed-off-by: Olivier Matz 
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst | 6 ++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/doc/guides/rel_notes/deprecation.rst
> >> b/doc/guides/rel_notes/deprecation.rst
> >> index f502f86..2245bc2 100644
> >> --- a/doc/guides/rel_notes/deprecation.rst
> >> +++ b/doc/guides/rel_notes/deprecation.rst
> >> @@ -41,3 +41,9 @@ Deprecation Notices
> >>  * The mempool functions for single/multi producer/consumer are
> deprecated and
> >>will be removed in 16.11.
> >>It is replaced by rte_mempool_generic_get/put functions.
> >> +
> >> +* ABI changes are planned for 16.11 in the ``rte_mbuf`` structure:
> >> +some
> >> +  fields will be reordered to facilitate the writing of
> >> +``data_off``,
> >> +  ``refcnt``, and ``nb_segs`` in one operation. Indeed, some
> >> +platforms
> >> +  have an overhead if the store address is not naturally aligned.
> >> +The
> >> +  useless ``port`` field will also be removed at the same occasion.
> >> --
> >
> > Have we fully bottomed out on the mbuf changes. I'm not sure that once
> > patches start getting considered for merge, new opinions may come
> > forward. For instance, is the "port" field really "useless"?
> >
> > Would it not be better to put in a less specific deprecation notice?
> > What happens if this notice goes in and the final changes are
> > different from those called out here?
> 
> Yes, you are right. What about the following text?
> 
> ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some
> fields may be reordered to facilitate the writing of ``data_off``,
> ``refcnt``, and ``nb_segs`` in one operation. Indeed, some platforms have
> an overhead if the store address is not naturally aligned. The ``port``
> field may also be removed at the same occasion.
> 
Better. Two suggestions:
1. change "Indeed" to "because" and join the sentences.
2. change the last sentence to be even more general: "Other mbuf fields, such 
as the port field, may be moved or removed as part of this mbuf work".

/Bruce


[dpdk-dev] [PATCH v2 5/5] test: change lpm routes file from header to data file

2016-07-16 Thread Richardson, Bruce


> On 16 Jul 2016, at 00:05, Thomas Monjalon  
> wrote:
> 
> 2016-07-14 17:06, Bruce Richardson:
>> Change the file extension of the test_lpm_routes file from .h to .dat.
>> This makes the lines-of-code counts for DPDK more realistic as they are not
>> affected by the huge counts from the lpm data.
>> 
>> Signed-off-by: Bruce Richardson 
>> ---
>> app/test/Makefile   | 2 +-
>> app/test/{test_lpm_routes.h => test_lpm_routes.dat} | 0
> 
> The size of the git tree significantly increase after renaming this file:
>git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0
>-c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc
>--aggressive && du -sk .git
> 28272   .git
>git am
>git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0
>-c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc
>--aggressive && du -sk .git
> 33188   .git
> 
> I think it would be better to replace this huge file by a script generating
> the test resource.
> 
Completely agree. This was a temporary fix anyway so if it causes git bloat 
lets just wait for the permanent fix.

Bruce 


[dpdk-dev] [PATCH] mempool: rename functions with confusing names

2016-06-29 Thread Richardson, Bruce
> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Wednesday, June 29, 2016 5:05 PM
> To: Wiles, Keith ; Richardson, Bruce
> ; Thomas Monjalon  6wind.com>
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mempool: rename functions with confusing
> names
> 
> 
> 
> On 06/29/2016 06:02 PM, Wiles, Keith wrote:
> >
> > On 6/29/16, 11:00 AM, "dev on behalf of Bruce Richardson"  bounces at dpdk.org on behalf of bruce.richardson at intel.com> wrote:
> >
> >> On Wed, Jun 29, 2016 at 05:55:27PM +0200, Thomas Monjalon wrote:
> >>> 2016-06-29 14:55, Bruce Richardson:
> >>>> The mempool_count and mempool_free_count behaved contrary to what
> >>>> their names suggested. The free_count function actually returned
> >>>> the number of elements that were allocated from the pool, not the
> >>>> number unallocated as the name implied.
> 
> I agree the current API is not appropriate.
> 
> 
> >>>> Fix this by introducing two new functions to replace the old ones,
> >>>> * rte_mempool_unallocated_count to replace rte_mempool_count
> >>>> * rte_mempool_allocated_count to replace rte_mempool_free_count
> >>>
> >>> What about available/used instead of unallocated/allocated?
> >>>
> >>
> >> I don't particularly mind what the name is, to be honest. I like
> "avail"
> >> because it is shorter, but I'm a little uncertain about "used",
> >> because it implies that the entries are finished with i.e. like a
> >> used match, or tissue :-)
> >>
> >> How about "avail/in_use"?
> >
> > +1 for those names.
> 
> +1 too.
> 
> rte_mempool_avail_count()
> rte_mempool_in_use_count()
> 

Ok, I'll see about doing a V2 for review.

/Bruce


[dpdk-dev] [PATCH] doc: virtio pmd versions

2016-06-09 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mcnamara, John

> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhihong Wang
 
> > +
> > +Virtio PMD Versions
> > +---
> > +
> > +Virtio driver has 3 versions of rx functions and 2 versions of tx
> > functions.
> 
> In some places RX/TX is used and in some rx/tx. I would suggest the
> uppercase versions throughout.
> 

In the commit logs, the only valid contractions allowed by the check-git-log.sh 
script are Rx and Tx

bad=$(echo "$headlines" | grep -E --color=always \
-e '\<(rx|tx|RX|TX)\>' \
 

I would therefore suggest we follow the same rules for the docs for consistency.

/Bruce


[dpdk-dev] [PATCH 2/2] qede: return LAN stats to MFW

2016-06-01 Thread Richardson, Bruce


> -Original Message-
> From: Harish Patil [mailto:harish.patil at qlogic.com]
> Sent: Wednesday, June 1, 2016 3:16 PM
> To: Thomas Monjalon 
> Cc: dev at dpdk.org; Richardson, Bruce ; Rasesh
> Mody ; Dept-Eng DPDK Dev  EngDPDKDev at qlogic.com>
> Subject: Re: [dpdk-dev] [PATCH 2/2] qede: return LAN stats to MFW
> 
> >
> >2016-05-31 19:21, Harish Patil:
> >>
> >> >On Fri, May 06, 2016 at 09:21:31PM -0700, Rasesh Mody wrote:
> >> >> From: Harish Patil 
> >> >>
> >> >> Under certain scenarios, MFW periodically polls the driver for LAN
> >> >> statistics. This patch implements the osal hook to fill in the
> >> >> stats.
> >> >>
> >> >> Fixes: ffa002d318d36 ("qede: add base driver")
> >> >>
> >> >What is MFW?
> >> >
> >> >/Bruce
> >> >
> >>
> >> MFW - Management FirmWare running on the card.
> >
> >So MFW can probably be replaced by firmware in the title.
> >
> 
> Reason I didn?t use ?firmware? in the first place is because there are two
> different firmware running on the card:
> 1) MFW (management firmware - which is flashed)
> 2) Firmware (datapath  firmware - loaded by driver by reading FW file)
> 
> So, I can replace it as management firmware explicitly.
> 
> 
> Thanks,
> Harish
> 
How about firmware in the title, and then you can clarify it as management 
firmware in the message itself?

Regards,
/Bruce


[dpdk-dev] [PATCH v2] doc: fix typo in freebsd doc

2016-05-19 Thread Richardson, Bruce


> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Thursday, May 19, 2016 3:39 PM
> To: dev at dpdk.org; Richardson, Bruce 
> Cc: Mcnamara, John ; De Lara Guarch, Pablo
> 
> Subject: [PATCH v2] doc: fix typo in freebsd doc
> 
> Signed-off-by: Pablo de Lara 
Acked-by: Bruce Richardson 



[dpdk-dev] [RFC PATCH] tools:new tool for system info CPU, memory and huge pages

2016-05-16 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Monday, May 16, 2016 3:32 PM
> To: Wiles, Keith 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH] tools:new tool for system info CPU,
> memory and huge pages
> 
> 2016-05-16 14:11, Wiles, Keith:
> > >2016-05-13 15:48, Wiles, Keith:
> > >> I create this new tool to combine some information and use
> /sys/devices instead. What I was hoping was some of you could try this
> script and see if it works correctly. Also I was hope to find out if this
> script is useful and what other features you would like in this type of
> tool.
> > >
> > >What is the intent of this script? Is it to be used for bug report?
> > >There already have some tools to display system informations. Why
> > >adding one more?
> > >Examples of useful tools: hwloc/lstopo, lspci, hugeadm.
> >
> > I was looking to replace the cpu_layout.py tool which uses the /procfs
> instead of /sysfs, just figured we could then add some extra information
> into this script as well. Yes, we have other tools, but some people do not
> know or use or install these tools. I was hoping this one would be able to
> display a number of things to help the developer and us in helping them
> debug problems.
> >
> > Stephen Hemminger sent an email about the use of sysfs instead of
> procfs.
> > http://dpdk.org/ml/archives/dev/2016-May/038560.html
> 
> I agree that cpu_layout.py should be removed.
> Should we implement something else? Or just point to existing tools?
> Or call existing tools from a small script?
> Is it the DPDK focus to develop and maintain such tool?

+1 for pointing to existing tools.

/Bruce


[dpdk-dev] [PATCH] pci: Add the class_id support in pci probe

2016-05-11 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Wednesday, May 11, 2016 4:21 PM
> To: Yang, Ziye 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] pci: Add the class_id support in pci probe
> 
> On Wed, 11 May 2016 14:08:15 +0800
> Ziye Yang  wrote:
> 
> > This patch is used to add the class_id (class_code, subclass_code,
> > programming_interface) support for pci_device probe. With this patch,
> > it will be flexible for users to probe a class of devices by class_id.
> >
> > Signed-off-by: Ziye Yang 
> 
> I like this, and it is necessary but since rte_pci_id is a visible data
> structure it causes ABI breakage.

A notice was published for this change in 16.04, so we should be ok ABI-wise.

/Bruce


[dpdk-dev] [PATCH] librte_eal: fix wrong args operation in eal_parse_args

2016-05-11 Thread Richardson, Bruce


> -Original Message-
> From: Yang, Ziye
> Sent: Wednesday, May 11, 2016 2:39 PM
> To: Richardson, Bruce 
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] librte_eal: fix wrong args operation in
> eal_parse_args
> 
> Hi Bruce,
> 
> Could you tell me the documentation file name? Then I can conduct the
> following documentation relate d patch.

The behavior is already documented in the doxygen API definitions for 
rte_eal_init().

* @return$
 *   - On success, the number of parsed arguments, which is greater or$
 * equal to zero. After the call to rte_eal_init(),$
 * all arguments argv[x] with x < ret may be modified and should$
 * not be accessed by the application.$
 *   - On failure, a negative error value.$

However, maybe you want to call it out in a different way to make it clearer 
for those using the functions. Other than that, perhaps look in the programmers 
guide document to see if it needs a mention there - or a reference to the API 
doc.

/Bruce

PS: When replying, please respond below existing emails, rather than 
top-posting. Thanks.

> 
> -----Original Message-
> From: Richardson, Bruce
> Sent: Wednesday, May 11, 2016 8:21 PM
> To: Yang, Ziye 
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] librte_eal: fix wrong args operation in
> eal_parse_args
> 
> 
> 
> > -Original Message-
> > From: Yang, Ziye
> > Sent: Wednesday, May 11, 2016 12:51 PM
> > To: Richardson, Bruce 
> > Cc: dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH] librte_eal: fix wrong args operation
> > in eal_parse_args
> >
> > Hi Bruce,
> >
> > Could it be fixed later?  If not, it should be documented. I faced
> > this issued today, and found that dpdk changed my last arg.  In my
> > mind, dpdk should not change the argv[last], which will confuse the
> users.
> >
> 
> We can certainly consider changing it, but I am concerned about stability
> of existing apps. I think it needs some discussion and consensus on-list
> before a change like this is made.
> 
> For right now, definitely the behavior should be documented. Would you
> consider submitting a documentation update patch for this issue instead of
> this code change?
> 
> Thanks,
> /Bruce
> 
> > Best Regards,
> > Ziye Yang
> >
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Wednesday, May 11, 2016 7:25 PM
> > To: Yang, Ziye 
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix wrong args operation
> > in eal_parse_args
> >
> > On Wed, May 11, 2016 at 01:28:21PM +0800, Ziye Yang wrote:
> > > This patch is used to fix wrong operation on user input args.
> > > eal_parse_args function should not operate the args passed by the
> > > user. If the element in argv is generated by malloc function,
> > > changing it  will cause memory issues when free the args.
> > >
> > > Signed-off-by: Ziye Yang 
> > > ---
> > >  lib/librte_eal/bsdapp/eal/eal.c   | 2 --
> > >  lib/librte_eal/linuxapp/eal/eal.c | 2 --
> > >  2 files changed, 4 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/bsdapp/eal/eal.c
> > > b/lib/librte_eal/bsdapp/eal/eal.c index 06bfd4e..0eef92d 100644
> > > --- a/lib/librte_eal/bsdapp/eal/eal.c
> > > +++ b/lib/librte_eal/bsdapp/eal/eal.c
> > > @@ -420,8 +420,6 @@ eal_parse_args(int argc, char **argv)
> > >   goto out;
> > >   }
> > >
> > > - if (optind >= 0)
> > > - argv[optind-1] = prgname;
> > >   ret = optind-1;
> > >
> > >  out:
> > > diff --git a/lib/librte_eal/linuxapp/eal/eal.c
> > > b/lib/librte_eal/linuxapp/eal/eal.c
> > > index 8aafd51..ba9d1ac 100644
> > > --- a/lib/librte_eal/linuxapp/eal/eal.c
> > > +++ b/lib/librte_eal/linuxapp/eal/eal.c
> > > @@ -658,8 +658,6 @@ eal_parse_args(int argc, char **argv)
> > >   goto out;
> > >   }
> > >
> > > - if (optind >= 0)
> > > - argv[optind-1] = prgname;
> > >   ret = optind-1;
> > >
> > >  out:
> > This is a behaviour change in DPDK. The behaviour has always been that
> > after calling eal init, you can update your argv/argc values by the
> > number of args parsed and then parse your app args as normal, since
> > argv[0] will still point to your program name. While I agree that
> > having the current behaviour may cause some problems, changing this
> > behaviour may break applications that have been written to use the
> existing behaviour.
> >
> > Since it is only the last EAL parameter arg that is modified, I think
> > it would be acceptable to have the behaviour well documented and then
> > expect any app to store a second copy of the pointer to be modified if
> > it is needed for a subsequent free call, for example.
> >
> > /Bruce


[dpdk-dev] [PATCH] librte_eal: fix wrong args operation in eal_parse_args

2016-05-11 Thread Richardson, Bruce


> -Original Message-
> From: Yang, Ziye
> Sent: Wednesday, May 11, 2016 12:51 PM
> To: Richardson, Bruce 
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] librte_eal: fix wrong args operation in
> eal_parse_args
> 
> Hi Bruce,
> 
> Could it be fixed later?  If not, it should be documented. I faced this
> issued today, and found that dpdk changed my last arg.  In my mind, dpdk
> should not change the argv[last], which will confuse the users.
> 

We can certainly consider changing it, but I am concerned about stability of 
existing apps. I think it needs some discussion and consensus on-list before a 
change like this is made.

For right now, definitely the behavior should be documented. Would you consider 
submitting a documentation update patch for this issue instead of this code 
change?

Thanks,
/Bruce

> Best Regards,
> Ziye Yang
> 
> -Original Message-
> From: Richardson, Bruce
> Sent: Wednesday, May 11, 2016 7:25 PM
> To: Yang, Ziye 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix wrong args operation in
> eal_parse_args
> 
> On Wed, May 11, 2016 at 01:28:21PM +0800, Ziye Yang wrote:
> > This patch is used to fix wrong operation on user input args.
> > eal_parse_args function should not operate the args passed by the
> > user. If the element in argv is generated by malloc function, changing
> > it  will cause memory issues when free the args.
> >
> > Signed-off-by: Ziye Yang 
> > ---
> >  lib/librte_eal/bsdapp/eal/eal.c   | 2 --
> >  lib/librte_eal/linuxapp/eal/eal.c | 2 --
> >  2 files changed, 4 deletions(-)
> >
> > diff --git a/lib/librte_eal/bsdapp/eal/eal.c
> > b/lib/librte_eal/bsdapp/eal/eal.c index 06bfd4e..0eef92d 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal.c
> > @@ -420,8 +420,6 @@ eal_parse_args(int argc, char **argv)
> > goto out;
> > }
> >
> > -   if (optind >= 0)
> > -   argv[optind-1] = prgname;
> > ret = optind-1;
> >
> >  out:
> > diff --git a/lib/librte_eal/linuxapp/eal/eal.c
> > b/lib/librte_eal/linuxapp/eal/eal.c
> > index 8aafd51..ba9d1ac 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal.c
> > @@ -658,8 +658,6 @@ eal_parse_args(int argc, char **argv)
> > goto out;
> > }
> >
> > -   if (optind >= 0)
> > -   argv[optind-1] = prgname;
> > ret = optind-1;
> >
> >  out:
> This is a behaviour change in DPDK. The behaviour has always been that
> after calling eal init, you can update your argv/argc values by the number
> of args parsed and then parse your app args as normal, since argv[0] will
> still point to your program name. While I agree that having the current
> behaviour may cause some problems, changing this behaviour may break
> applications that have been written to use the existing behaviour.
> 
> Since it is only the last EAL parameter arg that is modified, I think it
> would be acceptable to have the behaviour well documented and then expect
> any app to store a second copy of the pointer to be modified if it is
> needed for a subsequent free call, for example.
> 
> /Bruce


[dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned

2016-05-04 Thread Richardson, Bruce


> -Original Message-
> From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> Sent: Wednesday, May 4, 2016 2:43 PM
> To: Richardson, Bruce 
> Cc: dev at dpdk.org; thomas.monjalon at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache
> aligned
> 
> On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:
> > On Tue, May 03, 2016 at 06:12:07PM +0530, Jerin Jacob wrote:
> > > Elements of struct rte_eth_dev used in the fast path.
> > > Make struct rte_eth_dev cache aligned to avoid the cases where
> > > rte_eth_dev elements share the same cache line with other structures.
> > >
> > > Signed-off-by: Jerin Jacob 
> > > ---
> > > v2:
> > > Remove __rte_cache_aligned from rte_eth_devices and keep it only at
> > > struct rte_eth_dev definition as suggested by Bruce
> > > http://dpdk.org/dev/patchwork/patch/12328/
> > > ---
> > >  lib/librte_ether/rte_ethdev.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > b/lib/librte_ether/rte_ethdev.h index 2757510..48f14d5 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -1615,7 +1615,7 @@ struct rte_eth_dev {
> > >   struct rte_eth_rxtx_callback
> *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> > >   uint8_t attached; /**< Flag indicating the port is attached */
> > >   enum rte_eth_dev_type dev_type; /**< Flag indicating the device
> > > type */ -};
> > > +} __rte_cache_aligned;
> > >
> > >  struct rte_eth_dev_sriov {
> > >   uint8_t active;   /**< SRIOV is active with 16, 32 or 64
> pools */
> > > --
> >
> > Hi Jerin,
> 
> Hi Bruce,
> 
> >
> > have you seen a performance degradation due to ethdev elements sharing
> > a cache
> 
> No. Not because of sharing the cache line.
> 
> > line? I ask because, surprisingly for me, I actually see a performance
> > regression
> 
> I see performance degradation in PMD in my setup where independent changes
> are causing the performance issue in PMD(~<100k). That's the reason I
> thought making aligned cache line stuff where ever it makes sense so that
> independent change shouldn't impact the PMD performance and this patch was
> an initiative for the same.
> 
> > when I apply the above patch. It's not a big change - perf reduction
> > of <1% - but still noticable across multiple runs using testpmd. I'm
> > using two 1x40G NICs using i40e driver, and I see ~100kpps less
> > traffic per port after applying the patch. [CPU: Intel(R) Xeon(R) CPU
> > E5-2699 v3 @ 2.30GHz]
> 
> This particular patch does not have any performance degradation in my
> setup.
> CPU: ThunderX

Ok, so I take it that this patch is performance neutral on your setup, then?
If that's the case, can we hold off on merging it on the basis that it's not 
needed and does cause a slight regression.

Thanks,
/Bruce


[dpdk-dev] [PATCH] i40e: Remove redundant fdir forward declarations.

2016-04-27 Thread Richardson, Bruce
> -Original Message-
> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> Sent: Wednesday, April 27, 2016 5:20 PM
> To: Zhang, Helin 
> Cc: Rosen, Rami ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] i40e: Remove redundant fdir forward
> declarations.
> 
> On Wed, Apr 27, 2016 at 02:51:55AM +, Zhang, Helin wrote:
> >
> >
> > > -Original Message-
> > > From: Rosen, Rami
> > > Sent: Saturday, March 26, 2016 9:32 AM
> > > To: Zhang, Helin
> > > Cc: dev at dpdk.org; Rosen, Rami
> > > Subject: [PATCH] i40e: Remove redundant fdir forward declarations.
> > >
> > > This patch removes several redundant forward declarations in
> i40e_fdir.c.
> > >
> > > Signed-off-by: Rami Rosen 
> > Acked-by: Helin Zhang 
> 
> Added fixes line:
> Fixes: a778a1fa2e4e ("i40e: set up and initialize flow director")
> 
> Applied to dpdk-next-net/rel_16_04
> 
Where by 04 I obviously mean 07! :-)

/Bruce


[dpdk-dev] [PATCH v3 00/10] qede: Add qede PMD

2016-03-22 Thread Richardson, Bruce
I've had a quick scan over this patchset, and as you've probably seen I've made 
some public comments on it. General comments on the whole patchset are:
* Please run checkpatch on the patchset and clear up as many issues as you can. 
There are a number of typos called out which especially must be fixed. Both 
myself and Thomas always run checkpatch against patches before applying them. 
[I suggest using Thomas's checkpatches.sh script to do the checks as it 
disables many unnecessary warnings from checkpatch]
* Please put in commit descriptions for all patches bar those doing trivial 
things. The first three patches probably don't need a commit message, but the 
rest do.

/Bruce

> -Original Message-
> From: Rasesh Mody [mailto:rasesh.mody at qlogic.com]
> Sent: Saturday, March 19, 2016 12:53 AM
> To: thomas.monjalon at 6wind.com; Richardson, Bruce
> 
> Cc: dev at dpdk.org; ameen.rahman at qlogic.com; harish.patil at qlogic.com;
> sony.chacko at qlogic.com; Rasesh Mody 
> Subject: [PATCH v3 00/10] qede: Add qede PMD
> 
> Submitting v3 patch series for QEDE PMD. There is no code change from v2
> series except PMD version change. Earlier we had generated and tested the
> v2 series against dpdk tree then latest.
> 
> The v3 series includes:
>  - Patches generated and tested against latest dpdk-next-net
>  - Reworked MAINTAINERS patch to make it apply cleanly
>  - Incorporated Overview.rst update in the documentation patch
> 
> Please Apply.
> 
> Thanks!
> Rasesh
> 
> Rasesh Mody (10):
>   qede: Add maintainers
>   qede: Add documentation
>   qede: Add license file
>   qede: Add base driver
>   qede: Add core driver
>   qede: Add L2 support
>   qede: Add SRIOV support
>   qede: Add attention support
>   qede: Add DCBX support
>   qede: Enable PMD build
> 
>  MAINTAINERS |7 +
>  config/common_base  |   14 +
>  doc/guides/nics/index.rst   |1 +
>  doc/guides/nics/overview.rst|   78 +-
>  doc/guides/nics/qede.rst|  340 +
>  drivers/net/Makefile|1 +
>  drivers/net/qede/LICENSE.qede_pmd   |   28 +
>  drivers/net/qede/Makefile   |   95 +
>  drivers/net/qede/base/bcm_osal.c|  178 +
>  drivers/net/qede/base/bcm_osal.h|  395 +
>  drivers/net/qede/base/common_hsi.h  |  714 ++
>  drivers/net/qede/base/ecore.h   |  746 ++
>  drivers/net/qede/base/ecore_attn_values.h   |13287
> +++
>  drivers/net/qede/base/ecore_chain.h |  724 ++
>  drivers/net/qede/base/ecore_cxt.c   | 1961 
>  drivers/net/qede/base/ecore_cxt.h   |  157 +
>  drivers/net/qede/base/ecore_cxt_api.h   |   79 +
>  drivers/net/qede/base/ecore_dcbx.c  |  887 ++
>  drivers/net/qede/base/ecore_dcbx.h  |   55 +
>  drivers/net/qede/base/ecore_dcbx_api.h  |  160 +
>  drivers/net/qede/base/ecore_dev.c   | 3578 
>  drivers/net/qede/base/ecore_dev_api.h   |  497 +
>  drivers/net/qede/base/ecore_gtt_reg_addr.h  |   42 +
>  drivers/net/qede/base/ecore_gtt_values.h|   33 +
>  drivers/net/qede/base/ecore_hsi_common.h| 1912 
>  drivers/net/qede/base/ecore_hsi_eth.h   | 1912 
>  drivers/net/qede/base/ecore_hsi_tools.h | 1081 +++
>  drivers/net/qede/base/ecore_hw.c|  992 ++
>  drivers/net/qede/base/ecore_hw.h|  269 +
>  drivers/net/qede/base/ecore_hw_defs.h   |   49 +
>  drivers/net/qede/base/ecore_init_fw_funcs.c | 1275 +++
> drivers/net/qede/base/ecore_init_fw_funcs.h |  263 +
>  drivers/net/qede/base/ecore_init_ops.c  |  599 ++
>  drivers/net/qede/base/ecore_init_ops.h  |  103 +
>  drivers/net/qede/base/ecore_int.c   | 2225 +
>  drivers/net/qede/base/ecore_int.h   |  234 +
>  drivers/net/qede/base/ecore_int_api.h   |  277 +
>  drivers/net/qede/base/ecore_iov_api.h   |  933 ++
>  drivers/net/qede/base/ecore_iro.h   |  115 +
>  drivers/net/qede/base/ecore_iro_values.h|   59 +
>  drivers/net/qede/base/ecore_l2.c| 1798 
>  drivers/net/qede/base/ecore_l2.h|  151 +
>  drivers/net/qede/base/ecore_l2_api.h|  401 +
>  drivers/net/qede/base/ecore_mcp.c   | 1928 
>  drivers/net/qede/base/ecore_mcp.h   |  304 +
>  drivers/net/qede/base/ecore_mcp_api.h   |  611 ++
>  drivers/net/qede/base/ecore_proto_if.h  |   28 +
>  drivers/net/qede/base/ecore_rt_defs.h   |  446 +
>  drivers/net/qede/base/ecore_sp_api.h|   42 +
>  drivers/net/qede/base/ecore_sp_commands.c   |  525 ++
>  drivers/net/qede/base/ecore_sp_commands.h   |  137 +
>  drivers/net/qede/

[dpdk-dev] [PATCH] example/ip_pipeline: fix copy into fixed size buffer defect

2015-12-11 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mcnamara, John
> Sent: Friday, December 11, 2015 3:37 PM
> To: Zhang, Roy Fan ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] example/ip_pipeline: fix copy into fixed
> size buffer defect
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Fan Zhang
> > Sent: Friday, December 11, 2015 11:29 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] example/ip_pipeline: fix copy into fixed
> > size buffer defect
> >
> > Coverity issue: 107133
> > Fixes: eb32fe7c5574 ("examples/ip_pipeline: rework initialization
> > parameters")
> >
> > Signed-off-by: Fan Zhang 
> > Acked-by: Cristian Dumitrescu 
> > ---
> >  examples/ip_pipeline/init.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
> > index bc6d6d9..5bcb420 100644
> > --- a/examples/ip_pipeline/init.c
> > +++ b/examples/ip_pipeline/init.c
> > @@ -1068,7 +1068,10 @@ static void app_pipeline_params_get(struct
> > app_params *app,
> > uint32_t i;
> > uint32_t mempool_id;
> >
> > -   strcpy(p_out->name, p_in->name);
> > +   if (sizeof(p_in->name) > PIPELINE_NAME_SIZE)
> > +   strncpy(p_out->name, p_in->name, PIPELINE_NAME_SIZE);
> > +   else
> > +   strcpy(p_out->name, p_in->name);
> >
> > p_out->socket_id = (int) p_in->socket_id;
> >
> 
> Hi Fan,
> 
> I think there could still be issues here (depending of the size/types of
> p_out->name and p_in->name). Probably better as something like:
> 
> strncpy(p_out->name, p_in->name, PIPELINE_NAME_SIZE);
> p_out->name[PIPELINE_NAME_SIZE -1] = '\0';
> 
> John.
> --

Use snprintf to avoid having to explicitly null terminate, perhaps?
/Bruce


[dpdk-dev] [PATCH] reserve 'make install' for future use

2015-12-04 Thread Richardson, Bruce


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, December 4, 2015 3:57 PM
> To: Richardson, Bruce 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] reserve 'make install' for future use
> 
> 2015-11-30 15:26, Thomas Monjalon:
> > 2015-11-30 11:49, Bruce Richardson:
> > > On Mon, Nov 30, 2015 at 11:41:32AM +, Richardson, Bruce wrote:
> > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > Sent: Monday, November 30, 2015 11:27 AM
> > > > > To: Richardson, Bruce 
> > > > > Cc: Panu Matilainen ; dev at dpdk.org;
> > > > > olivier.matz at 6wind.com
> > > > > Subject: Re: [dpdk-dev] [PATCH] reserve 'make install' for
> > > > > future use
> > > > >
> > > > > 2015-11-30 11:08, Richardson, Bruce:
> > > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > > > Why is it a step in the right direction?
> > > > > > >
> > > > > > > We just need to install the files in a different hierarchy
> > > > > > > and adapt the makefiles to be able to compile an application
> > > > > > > while keeping the RTE_SDK variable to specify the root
> > > > > > > directory (previously built thanks to DESTDIR).
> > > > > > > As the hierarchy could be tuned, we need more variables, e.g.:
> > > > > > >   DPDK_INC_DIR (default = RTE_SDK/include/dpdk)
> > > > > > >   DPDK_LIB_DIR (default = RTE_SDK/lib)
> > > > > > >
> > > > > > > While doing it, we can have a specific handling of T= to
> > > > > > > keep compatibility with the current (old) syntax.
> > > > > > >
> > > > > > > What have I missed?
> > > > > >
> > > > > > I'm not sure our existing "make install" is suitable for use
> > > > > > for this,
> > > > > without having it heavily overloaded. The existing T= behavior
> > > > > has support for wildcards and compiling multiple instances at
> > > > > the same time - something that won't work with a scheme to
> > > > > actually install DPDK throughout the filesystem hierarchy.
> > > > > Having it sometimes behave as now, and sometimes behave as a
> > > > > standard make install is a bad idea IMHO, as it confuses things.
> > > > > Having lots of extra environment variables is also not a great
> idea, to my mind.
> > > > >
> > > > > Yes I agree.
> > > > > I forgot to mention it, but in my idea, we can drop the support
> > > > > for multiple targets. So the T= compatibility would be only a
> > > > > shortcut to do "make config" and name the build directory based on
> the template name.
> > > > >
> > > > > About the environment variables:
> > > > > An application requires CFLAGS and LDFLAGS (at least). The
> > > > > standard way to provide them is pkgconfig (not implemented yet).
> > > > > For applications using the DPDK makefiles, the only input is
> RTE_SDK.
> > > > > When allowing more tuning in paths, we need more variables when
> > > > > using the DPDK makefiles to build an application.
> > > > >
> > > > > > My opinion is that we should rename our existing "make
> > > > > > install" to
> > > > > something more suitable - my patch suggestion was "make sdk" but
> > > > > it could be "make target" or something else if people prefer.
> > > > > Once that is done, we can then look to implement a proper "make
> > > > > install" command that works in a standard way, perhaps alongside a
> configure script of some description.
> > > > >
> > > > > I think we don't need to rename or move some code.
> > > > > Just drop and replace some of them.
> > > > >
> > > > > The configure script is a great idea but it is a totally different
> idea.
> > > > > I do not think that installation and configuration should be
> related.
> > > > > Please let's consider "make install" first.
> > > > >
> > > > > > For an easy enough solution, I would look to apply this patch
> > > > > > to create
> >

[dpdk-dev] 2.3 Roadmap

2015-12-01 Thread Richardson, Bruce


> -Original Message-
> From: Aaron Conole [mailto:aconole at redhat.com]
> Sent: Tuesday, December 1, 2015 3:31 PM
> To: Richardson, Bruce 
> Cc: Panu Matilainen ; dev at dpdk.org
> Subject: Re: [dpdk-dev] 2.3 Roadmap
> 
> Bruce Richardson  writes:
> > On Tue, Dec 01, 2015 at 04:58:08PM +0200, Panu Matilainen wrote:
> >> On 12/01/2015 04:48 PM, Vincent JARDIN wrote:
> >> >On 01/12/2015 15:27, Panu Matilainen wrote:
> >> >>The problem with that (unless I'm missing something here) is that
> >> >>KNI requires using out-of-tree kernel modules which makes it pretty
> >> >>much a non-option for distros.
> >> >
> >> >It works fine with some distros. I do not think it should be an
> argument.
> >>
> >> Its not a question of *working*, its that out-of-tree kernel modules
> >> are considered unsupportable by the kernel people. So relying on KNI
> >> would make the otherwise important and desireable tcpdump feature
> >> non-existent on at least Fedora and RHEL where such modules are
> >> practically outright banned by distro policies.
> >>
> >>- Panu -
> >
> > Yes, KNI is a bit of a problem right now in that way.
> >
> > How about a solution which is just based around the idea of setting up
> > a generic port mirroring callback? Hopefully in the future we can get
> > KNI exposed as a PMD, and we already have a ring PMD, and could
> > possibly do a generic file/fifo PMD.
> > Between the 3, we could then have multiple options for intercepting
> > traffic going in/out of an app. The callback would just have to copy
> > the traffic to the selected interface before returning it to the app as
> normal?
> >
> > /Bruce
> 
> I'm actually working on a patch series that uses a TAP device (it's
> currently been only minorly tested) called back from the port input. The
> benefit is no dependancy on kernel modules (just TUN/TAP support). I don't
> have a way of signaling sampling, so right now, it's just drinking from
> the firehose. Nothing I'm ready to put out publicly (because it's ugly -
> just a PoC), but it allows a few things:
> 
> 1) on demand on/off using standard linux tools (ifconfig/ip to set tap
>device up/down)
> 2) Can work with any tool which reads off of standard linux interfaces
>(tcpdump/wireshark work out of the box, but you could plug in any
>pcap or non-pcap tool)
> 3) Doesn't require changes to the application (no command line switches
>during startup, etc.)
> 
> As I said, I'm not ready to put it out there publicly, because I haven't
> had a chance to check the performance, and it's definitely not following
> any kind of DPDK-like coding style. Just wanted to throw this out as food
> for thought - if you think this approach is worthwhile I can try to
> prioritize it, at least to get an RFC series out.
> 
> -Aaron

Once I had a generic file-handling PMD written, I was then considering extending
it to work with TUN/TAP too. :-)
I think a TAP PMD would be useful for the downstream distros who can't package
KNI as it is right now.

/Bruce


[dpdk-dev] Unable to configure ethdev in secondary process using ring PMD

2015-11-30 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexey Bogdanenko
> Sent: Monday, November 30, 2015 4:17 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Unable to configure ethdev in secondary process using
> ring PMD
> 
> Hello,
> 
> I would like to setup communication between two existing DPDK applications
> and run them on the same host.
> 
> "Connecting their ports" in some way in order not to rewrite the
> applications would be very desirable. Specifically, I would like one
> process to send packets and the second process to receive the packets
> using rte_eth_tx_burst() and rte_eth_rx_burst() respectively.
> 
> The most straightforward way to accomplish this seems to be by using ring
> based PMD API as described in the documentation [1] and email [2].
> To adapt the example from the documentation to multi-process scenario, I
> call rte_ring_create() and rte_eth_from_rings() in the primary process,
> rte_ring_lookup() and rte_eth_from_rings() in the secondary process.
> After that each process calls rte_eth_dev_configure().
> 
> Unfortunately, the function returns -1001 in the secondary process, which
> is explained in debug log:
> 
> PMD: rte_eth_dev_configure: Cannot run in secondary processes
> 
> Is it possible to connect the applications as described above? Any advice
> would be appreciated.
> 
> References:
> 
> 1. Network Interface Controller Drivers. Chapter 8.
> Libpcap and Ring Based Poll Mode Drivers.
> 
> 2. DPDK ML. Fri Dec 6 07:22:06 CET 2013. How to know corresponding device
> from port number. Tetsuya.Mukawa
> 
> Thanks,
> 
> Alexey Bogdanenko

Hi Alexey,

The ring PMDs returned from eth_from_rings should be all ready to be used 
without having to explicitly configure it or set up the queues.

/Bruce



[dpdk-dev] [PATCH] reserve 'make install' for future use

2015-11-30 Thread Richardson, Bruce


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, November 30, 2015 11:27 AM
> To: Richardson, Bruce 
> Cc: Panu Matilainen ; dev at dpdk.org;
> olivier.matz at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH] reserve 'make install' for future use
> 
> 2015-11-30 11:08, Richardson, Bruce:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > Why is it a step in the right direction?
> > >
> > > We just need to install the files in a different hierarchy and adapt
> > > the makefiles to be able to compile an application while keeping the
> > > RTE_SDK variable to specify the root directory (previously built
> > > thanks to DESTDIR).
> > > As the hierarchy could be tuned, we need more variables, e.g.:
> > >   DPDK_INC_DIR (default = RTE_SDK/include/dpdk)
> > >   DPDK_LIB_DIR (default = RTE_SDK/lib)
> > >
> > > While doing it, we can have a specific handling of T= to keep
> > > compatibility with the current (old) syntax.
> > >
> > > What have I missed?
> > >
> >
> >
> > I'm not sure our existing "make install" is suitable for use for this,
> without having it heavily overloaded. The existing T= behavior has support
> for wildcards and compiling multiple instances at the same time -
> something that won't work with a scheme to actually install DPDK
> throughout the filesystem hierarchy. Having it sometimes behave as now,
> and sometimes behave as a standard make install is a bad idea IMHO, as it
> confuses things. Having lots of extra environment variables is also not a
> great idea, to my mind.
> 
> Yes I agree.
> I forgot to mention it, but in my idea, we can drop the support for
> multiple targets. So the T= compatibility would be only a shortcut to do
> "make config" and name the build directory based on the template name.
> 
> About the environment variables:
> An application requires CFLAGS and LDFLAGS (at least). The standard way to
> provide them is pkgconfig (not implemented yet).
> For applications using the DPDK makefiles, the only input is RTE_SDK.
> When allowing more tuning in paths, we need more variables when using the
> DPDK makefiles to build an application.
> 
> > My opinion is that we should rename our existing "make install" to
> something more suitable - my patch suggestion was "make sdk" but it could
> be "make target" or something else if people prefer. Once that is done, we
> can then look to implement a proper "make install" command that works in a
> standard way, perhaps alongside a configure script of some description.
> 
> I think we don't need to rename or move some code.
> Just drop and replace some of them.
> 
> The configure script is a great idea but it is a totally different idea.
> I do not think that installation and configuration should be related.
> Please let's consider "make install" first.
> 
> > For an easy enough solution, I would look to apply this patch to create
> "make sdk" and also http://dpdk.org/dev/patchwork/patch/8076/ to have a
> "make install" command that works in the build dir. That way:
> > * you can have existing behavior using "make sdk T="
> > * you can have standard(ish) configure/make/make install behavior using:
> > make config T=
> > cd build
> > make
> > make install
> >   and the "make config" step can subsequently be wrapped in a configure
> script to eliminate the need to know what the best target to use is, etc.
> 
> As Panu commented, I do not think it is a good idea to have different
> behaviours inside and outside of the build directory.
> I would even say that this embedded makefile is only confusing and should
> be dropped.
> We need to have *one* right building methods, not to bring more confusion.

I disagree. I don't think we can have *one* right building method, because to
do so means completely throwing away our existing methods of building DPDK
and using sample applications. That general method, using RTE_SDK and RTE_TARGET
needs to be supported for some time for those projects already familiar with it
and using it.
As well as this, we also need a sane way of building DPDK inside the "build" 
directory, and having a "make install" target that installs the libraries
and headers inside /usr/local (or whatever was specified as $prefix).

With regards to different behavior, since different targets are provided, I
don't see it as a problem. In the root directory, "make config" and "make sdk"
are provided for backward compatibility. Inside the build directory you have
your standard "make" and "make install" commands. Since the command set is
very limited, it's easy enough to print a suitable error when the wrong
command is used in the wrong place. 

Yes, I would like the ideal state where we have one set of build commands that
are run from just one location. However, I don't think we can get to that 
objective
without going through a transition phase where we support both old and new 
options.

/Bruce



[dpdk-dev] [PATCH] reserve 'make install' for future use

2015-11-30 Thread Richardson, Bruce


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, November 27, 2015 5:33 PM
> To: Panu Matilainen ; Richardson, Bruce
> 
> Cc: dev at dpdk.org; olivier.matz at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH] reserve 'make install' for future use
> 
> 2015-11-25 10:48, Panu Matilainen:
> > On 11/24/2015 06:54 PM, Bruce Richardson wrote:
> > > On Fri, Nov 06, 2015 at 02:04:54PM +0100, Thomas Monjalon wrote:
> > >> 2015-11-06 12:57, Bruce Richardson:
> > >>> So, any thoughts or comments on this? There has been lots of
> > >>> discussion in this general area but nothing yet going into the
> release to try and improve the situation.
> > >>>
> > >>> Are we just going to kick the problem down the road to the 2.3
> release?
> > >>
> > >> I plan to check these patches in the coming days for an integration
> in 2.2.
> > >>
> > > Anything further on this?
> > > Any thoughts from anyone else about this whole area of a saner
> > > build/install system for DPDK and the various patches floating around.
> >
> > Well, it seems we wont have a sane "make install" in 2.2 yet, but this
> > is at least a step in the right direction so +1 from me.
> 
> Why is it a step in the right direction?
> 
> We just need to install the files in a different hierarchy and adapt the
> makefiles to be able to compile an application while keeping the RTE_SDK
> variable to specify the root directory (previously built thanks to
> DESTDIR).
> As the hierarchy could be tuned, we need more variables, e.g.:
>   DPDK_INC_DIR (default = RTE_SDK/include/dpdk)
>   DPDK_LIB_DIR (default = RTE_SDK/lib)
> 
> While doing it, we can have a specific handling of T= to keep
> compatibility with the current (old) syntax.
> 
> What have I missed?
> 


I'm not sure our existing "make install" is suitable for use for this, without 
having it heavily overloaded. The existing T= behavior has support for 
wildcards and compiling multiple instances at the same time - something that 
won't work with a scheme to actually install DPDK throughout the filesystem 
hierarchy. Having it sometimes behave as now, and sometimes behave as a 
standard make install is a bad idea IMHO, as it confuses things. Having lots of 
extra environment variables is also not a great idea, to my mind.

My opinion is that we should rename our existing "make install" to something 
more suitable - my patch suggestion was "make sdk" but it could be "make 
target" or something else if people prefer. Once that is done, we can then look 
to implement a proper "make install" command that works in a standard way, 
perhaps alongside a configure script of some description. 

For an easy enough solution, I would look to apply this patch to create "make 
sdk" and also http://dpdk.org/dev/patchwork/patch/8076/ to have a "make 
install" command that works in the build dir. That way:
* you can have existing behavior using "make sdk T="
* you can have standard(ish) configure/make/make install behavior using:
make config T=
cd build
make
make install
  and the "make config" step can subsequently be wrapped in a configure script 
to eliminate the need to know what the best target to use is, etc.



/Bruce


[dpdk-dev] [PATCH 1/1] app/test: create ring and ethdevs in pmd_ring_autotest

2015-11-24 Thread Richardson, Bruce


> -Original Message-
> From: Iremonger, Bernard
> Sent: Tuesday, November 24, 2015 4:29 PM
> To: Richardson, Bruce 
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/1] app/test: create ring and ethdevs in
> pmd_ring_autotest
> 
> Hi Bruce,
> 
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Tuesday, November 24, 2015 4:14 PM
> > To: Iremonger, Bernard 
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 1/1] app/test: create ring and ethdevs
> > in pmd_ring_autotest
> >
> > On Mon, Nov 23, 2015 at 03:20:46PM +, Bernard Iremonger wrote:
> > > Use command line option --vdev=eth_ring0 to create port 0.
> > > Create two rings and five ethdevs in test_pmd_ring for ports 1 to 5.
> > > Improve test output by adding the port number to printf statements,
> > > and adding a printf describing each test.
> > >
> > > revise ring-based PMD doc to match latest ring PMD code.
> > >
> > > Signed-off-by: Bernard Iremonger 
> >
> > The doc changes are not relevant to the unit test fixes, so should be
> > in a separate patch. The test changes themselves are good though, and
> > actually makes the test runable without having to find a very specific
> > command-line incantation to make things work right.
> >
> > Subject to this being split into two:
> > Acked-by: Bruce Richardson 
> 
> Will I keep your ack on both patches when I split them?
> 
> Regards,
> 
> Bernard.
> 
Sure, feel free to.

/Bruce


[dpdk-dev] [PATCH] ring: Fix memory leakage in rte_pmd_ring_devuninit()

2015-11-20 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mcnamara, John
> Sent: Friday, November 20, 2015 12:32 PM
> To: Mauricio Vasquez B ;
> dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ring: Fix memory leakage in
> rte_pmd_ring_devuninit()
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mauricio Vasquez
> > B
> > Sent: Wednesday, November 18, 2015 10:29 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] ring: Fix memory leakage in
> > rte_pmd_ring_devuninit()
> >
> > When freeing the device, it is also necessary to free rx_queues and
> > tx_queues
> >
> > Signed-off-by: Mauricio Vasquez B
> > 
> > ---
> >  drivers/net/ring/rte_eth_ring.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/ring/rte_eth_ring.c
> > b/drivers/net/ring/rte_eth_ring.c index 9a31bce..e091e4f 100644
> > --- a/drivers/net/ring/rte_eth_ring.c
> > +++ b/drivers/net/ring/rte_eth_ring.c
> > @@ -582,6 +582,9 @@ rte_pmd_ring_devuninit(const char *name)
> > return -ENODEV;
> >
> > eth_dev_stop(eth_dev);
> > +
> > +   rte_free(eth_dev->data->rx_queues);
> > +   rte_free(eth_dev->data->tx_queues);
> > rte_free(eth_dev->data->dev_private);
> > rte_free(eth_dev->data);
> 
> Hi,
> 
> This should test for eth_dev->data before freeing the members. Like:
> 
> if (eth_dev->data) {
> rte_free(eth_dev->data->rx_queues);
> rte_free(eth_dev->data->tx_queues);
> rte_free(eth_dev->data->dev_private);
> }
> 
> Thanks,
> 
> John

That was my thought initially too, but since this is an uninit routine, the 
data field must already have been set up correctly by the init/creation 
function. 
That being said, the check does no harm, so we might as well add it.

/Bruce


[dpdk-dev] [PATCH v3 2/4] ethdev: move error checking macros to header

2015-11-10 Thread Richardson, Bruce


> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> Sent: Tuesday, November 10, 2015 4:08 PM
> To: Richardson, Bruce 
> Cc: Stephen Hemminger ; Thomas Monjalon
> ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 2/4] ethdev: move error checking macros
> to header
> 
> On Mon, Nov 09, 2015 at 02:02:28PM +, Richardson, Bruce wrote:
> [...]
> > > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> [...]
> > > Untested but I guess modifying that function accordingly would look
> like:
> > >
> > >  static inline void
> > >  rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)  {
> > >  va_list ap;
> > >  va_start(ap, fmt);
> > >
> > >  static __thread char buffer[vsnprintf(NULL, 0, fmt, ap)];
> > >
> > >  va_end(ap);
> > >va_start(ap, fmt);
> > >  vsnprintf(buffer, sizeof(buffer), fmt, ap);
> > >va_end(ap);
> > >  rte_log(RTE_LOG_ERR, RTE_LOGTYPE_PMD, "%s: %s", func_name,
> > > buffer);  }
> > >
> >
> > Looks a much better option.
> >
> > From this, though, I assume then that we are only looking to support the
> -pedantic flag in conjuction with c99 mode or above. Supporting -pedantic
> with the pre-gcc-5 versions won't allow that to work though, as variably
> sized arrays only came in with c99, and were gnu extensions before that.
> 
> Right, -pedantic must follow a given standard such as -std=gnu99 otherwise
> it's meaningless.
> 
> However pre-GCC 5 is fine for most if not all features we use, see:
> 
>  https://gcc.gnu.org/c99status.html
> 
> Mixed code and declarations are supported since GCC 3.0, __VA_ARGS__ in
> macros since GCC 2.95 and variable length arrays since GCC 0.9, so as long
> as we use a version that implements -std=gnu99 (or -std=c99 to be really
> pedantic), it's fine.
> 
> Besides DPDK already uses C99 extensively, even a few C11 features (such
> as
> embedded anonymous struct definitions) currently supported in C99 mode as
> compiler extensions. I think we can safely ignore compilers that don't
> support common C99 features.
> 
> --
> Adrien Mazarguil
> 6WIND

Actually my point was slightly different. 
If we are supporting "-pedantic" in header files because "we don't know what 
compiler flags users are going to pass when", then we need to support it in C90 
mode to do the job properly. If you take gcc 4.8 and compile some code with 
"-pedantic" as the only C-flag you are going to get lots of errors because it 
will default to C90 mode with pedantic, which means no varargs macros at all. 
This limits the usefulness of supporting pedantic flag at all in our header 
files, since we only support it in certain situations with non-latest compilers.

/Bruce


[dpdk-dev] [PATCH v3 2/4] ethdev: move error checking macros to header

2015-11-09 Thread Richardson, Bruce


> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> Sent: Monday, November 9, 2015 1:39 PM
> To: Richardson, Bruce 
> Cc: Stephen Hemminger ; Thomas Monjalon
> ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 2/4] ethdev: move error checking macros
> to header
> 
> On Fri, Nov 06, 2015 at 05:22:27PM +, Bruce Richardson wrote:
> > On Fri, Nov 06, 2015 at 05:10:07PM +, Bruce Richardson wrote:
> > > On Thu, Nov 05, 2015 at 04:09:18PM +0100, Adrien Mazarguil wrote:
> > > >
> > > > I won't argue against this as it's obviously more complex than the
> original
> > > > method, however note that users of the RTE_PMD_DEBUG_TRACE() macro
> do not
> > > > have to modify their code. They shouldn't care about the
> implementation.
> > > >
> > > > Also note that we can do much cleaner code if we drop the all macros
> > > > implementation using a (much easier to debug) static inline
> function,
> > > > only perhaps with a wrapper macro that provides __LINE__, __func__
> and
> > > > __FILE__ as arguments. Nontrival code shouldn't be done in macros
> anyway.
> > > >
> > > Getting something working with __FILE__ and probably __LINE__ would be
> easy enough
> > > with a helper macro, but __func__ is not so easy as it's not a
> preprocessor symbol
> > > [since the pre-processor has no idea what function you are in].
> > >
> > > However, using func, here is the best I've come up with so far. It's
> not that
> > > pretty, but it's probably easier to work with than the macro version.
> > >
> > >  #ifdef RTE_LIBRTE_ETHDEV_DEBUG
> > >  -#define RTE_PMD_DEBUG_TRACE(fmt, args...) \
> > >  -   RTE_LOG(ERR, PMD, "%s: " fmt, __func__, ## args)
> > >  +#define RTE_PMD_DEBUG_TRACE(...) \
> > >  +   rte_pmd_debug_trace(__func__, __VA_ARGS__)
> > >  +
> > >  +static inline void
> > >  +rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
> > >  +{
> > >  +   static __thread char buffer[128];
> > >  +   char *out_buf = buffer;
> > >  +   unsigned count;
> > >  +   va_list ap;
> > >  +
> > >  +   count = snprintf(buffer, sizeof(buffer), "%s: %s", func_name,
> fmt);
> > >  +   if (count >= sizeof(buffer)) { // truncated output
> > >  +   char *new_buf = malloc(count + 1);
> > >  +   if (new_buf == NULL) // no memory, just print 128
> chars
> > >  +   goto print_buffer;
> > >  +   snprintf(new_buf, count + 1, "%s: %s", func_name,
> fmt);
> > >  +   va_start(ap, fmt);
> > >  +   rte_vlog(RTE_LOG_ERR, RTE_LOGTYPE_PMD, buffer, ap);
> > >  +   va_end(ap);
> > >  +   free(new_buf);
> > >  +   return;
> > >  +   }
> > >  +
> > >  +print_buffer:
> > >  +   va_start(ap, fmt);
> > >  +   rte_vlog(RTE_LOG_ERR, RTE_LOGTYPE_PMD, out_buf, ap);
> > >  +   va_end(ap);
> > >  +}
> > >   #else
> > >   #define RTE_PMD_DEBUG_TRACE(fmt, args...)
> > >   #endif
> > >
> > > Comments or improvements?
> 
> Such a function shouldn't malloc() anything. The entire line should fit on
> the stack (crashing is fine if it does not, then it's probably a bug). We
> did something in two passes along these lines in mlx5_defs.h (not pretty
> but
> quite useful):
> 
>  /* Allocate a buffer on the stack and fill it with a printf format
> string. */
>  #define MKSTR(name, ...) \
>  char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
>  \
>  snprintf(name, sizeof(name), __VA_ARGS__)
> 
> Untested but I guess modifying that function accordingly would look like:
> 
>  static inline void
>  rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
>  {
>  va_list ap;
>  va_start(ap, fmt);
> 
>  static __thread char buffer[vsnprintf(NULL, 0, fmt, ap)];
> 
>  va_end(ap);
>va_start(ap, fmt);
>  vsnprintf(buffer, sizeof(buffer), fmt, ap);
>va_end(ap);
>  rte_log(RTE_LOG_ERR, RTE_LOGTYPE_PMD, "%s: %s", func_name,
> buffer);
>  }
> 

Looks a much better option.

>From this, though, I assume then that we are only looking to support the 
>-pedantic flag in conjuction with c99 mode or above. Supporting -pedantic with 
>the pre-gcc-5 versions won't allow that to work though, as variably sized 
>arrays only came in with c99, and were gnu extensions before that.

Regards,
/Bruce



[dpdk-dev] [PATCH v3 2/4] ethdev: move error checking macros to header

2015-11-06 Thread Richardson, Bruce


> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> 
> On Fri, Nov 06, 2015 at 02:39:31PM +, Richardson, Bruce wrote:
> [...]
> > > > Hi Adrien,
> > > >
> > > > I'm trying to dig into this a bit more now, and try out using a
> > > > static inline function, but I'm having trouble getting DPDK to
> > > > compile with the mlx drivers turned on in the config. I'm trying
> > > > to follow the
> > > instructions here:
> > > > http://dpdk.org/doc/guides/nics/mlx4.html, but it's not clearly
> > > > called out what requirements are for compilation vs requirements
> > > > for running
> > > the PMD.
> > > >
> > > > I'm running Fedora 23, and installed the libibverbs-devel package,
> > > > but when I compile I get the following error:
> > > >
> > > > == Build drivers/net/mlx4
> > > >   CC mlx4.o
> > > >   /home/bruce/ethdev-cleanup/drivers/net/mlx4/mlx4.c: In function
> > > ?txq_cleanup?:
> > > >   /home/bruce/ethdev-cleanup/drivers/net/mlx4/mlx4.c:886:37: error:
> > > storage size of ?params? isn?t known
> > > > struct ibv_exp_release_intf_params params;
> > > >^ compilation terminated
> > > > due to -Wfatal-errors.
> > > >
> > > > Any suggestions on the fix for this?
> > >
> > > This is a known issue, libibverbs-devel package from Fedora 23 most
> > > likely does not support extended types and functions required by
> > > mlx4. You should remove the packages that come with your
> > > distribution and install libraries versions from Mellanox OFED as
> described in the next section:
> > >
> > >  http://dpdk.org/doc/guides/nics/mlx4.html#getting-mellanox-ofed
> > >
> > > Note: no need to fully install OFED for compilation checks, you can
> > > extract an updated libibverbs package from the archive.
> > >
> > > --
> > > Adrien Mazarguil
> > > 6WIND
> >
> > Hi again,
> >
> > I've installed the libibverbs and libibverbs-devel packages from the
> > mellanox site, but I'm still getting the same error. Anything else I
> might be missing?
> >
> > $ rpm -qa | grep mlnx
> > libibverbs-devel-1.1.8mlnx1-OFED.3.1.1.0.0.x86_64
> > libmlx5-1.0.2mlnx1-OFED.3.1.1.0.3.x86_64
> > libmlx4-1.0.6mlnx1-OFED.3.1.1.0.0.x86_64
> > libibverbs-1.1.8mlnx1-OFED.3.1.1.0.0.x86_64
> > libmlx4-devel-1.0.6mlnx1-OFED.3.1.1.0.0.x86_64
> > libmlx5-devel-1.0.2mlnx1-OFED.3.1.1.0.3.x86_64
> 
> That's weird, 'struct ibv_exp_release_intf_param' must be defined in
> /usr/include/infiniband/verbs_exp.h, itself included by
> infiniband/verbs.h, both normally part of the libibverbs-devel package
> above.
> 
> Make sure you don't have an old version of infiniband/verbs.h somewhere
> else such as in /usr/local/include after a manual compilation of
> libibverbs.
> 
> --
> Adrien Mazarguil
> 6WIND

Thanks, that fixed it. There was a copy of the verbs headers in 
/usr/local/include, which is strange because I never remember having ever tried 
compiling up ibverbs before.

Anyway, problem solved for now.
Thanks for your help.
/Bruce


[dpdk-dev] [PATCH v3 2/4] ethdev: move error checking macros to header

2015-11-06 Thread Richardson, Bruce
> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> 
> Hi Bruce,
> 
> On Fri, Nov 06, 2015 at 11:52:44AM +, Bruce Richardson wrote:
> > +Adrien on To: line
> >
> > Email user/client fail on original. :-(
> >
> > - Forwarded message from Bruce Richardson
> >  -
> >
> > Date: Fri, 6 Nov 2015 11:49:05 +
> > From: Bruce Richardson 
> > To: Stephen Hemminger , Thomas Monjalon
> > , dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v3 2/4] ethdev: move error checking
> > macros to header
> > User-Agent: Mutt/1.5.23 (2014-03-12)
> >
> > On Thu, Nov 05, 2015 at 04:09:18PM +0100, Adrien Mazarguil wrote:
> > > Bruce is asking for a consensus about -pedantic, whether we want to
> > > do the extra effort to support it in DPDK. Since I like checking for
> > > -pedantic errors, it's enabled for mlx4 and mlx5 when compiling
> > > these drivers in debugging mode. There is currently no established
> rule in DPDK against this.
> > >
> > > I'm arguing that most C headers (C compiler, libc, most libraries,
> > > even the Linux kernel in uapi to an extent) provide standards
> > > compliant includes because they cannot predict or force particular
> > > compilation flags on user applications.
> > >
> > > If we consider DPDK as a system wide library, I think we should do
> > > it as well in all installed header files. If we choose not to, then
> > > we must document that our code is not standard, -pedantic is
> > > unsupported and I'll have to drop it from mlx4 and mlx5.
> > >
> > > --
> > > Adrien Mazarguil
> > > 6WIND
> >
> > Hi Adrien,
> >
> > I'm trying to dig into this a bit more now, and try out using a static
> > inline function, but I'm having trouble getting DPDK to compile with
> > the mlx drivers turned on in the config. I'm trying to follow the
> instructions here:
> > http://dpdk.org/doc/guides/nics/mlx4.html, but it's not clearly called
> > out what requirements are for compilation vs requirements for running
> the PMD.
> >
> > I'm running Fedora 23, and installed the libibverbs-devel package, but
> > when I compile I get the following error:
> >
> > == Build drivers/net/mlx4
> >   CC mlx4.o
> >   /home/bruce/ethdev-cleanup/drivers/net/mlx4/mlx4.c: In function
> ?txq_cleanup?:
> >   /home/bruce/ethdev-cleanup/drivers/net/mlx4/mlx4.c:886:37: error:
> storage size of ?params? isn?t known
> > struct ibv_exp_release_intf_params params;
> >^ compilation terminated due to
> > -Wfatal-errors.
> >
> > Any suggestions on the fix for this?
> 
> This is a known issue, libibverbs-devel package from Fedora 23 most likely
> does not support extended types and functions required by mlx4. You should
> remove the packages that come with your distribution and install libraries
> versions from Mellanox OFED as described in the next section:
> 
>  http://dpdk.org/doc/guides/nics/mlx4.html#getting-mellanox-ofed
> 
> Note: no need to fully install OFED for compilation checks, you can
> extract an updated libibverbs package from the archive.
> 
> --
> Adrien Mazarguil
> 6WIND

Hi again,

I've installed the libibverbs and libibverbs-devel packages from the mellanox 
site, 
but I'm still getting the same error. Anything else I might be missing?

$ rpm -qa | grep mlnx
libibverbs-devel-1.1.8mlnx1-OFED.3.1.1.0.0.x86_64
libmlx5-1.0.2mlnx1-OFED.3.1.1.0.3.x86_64
libmlx4-1.0.6mlnx1-OFED.3.1.1.0.0.x86_64
libibverbs-1.1.8mlnx1-OFED.3.1.1.0.0.x86_64
libmlx4-devel-1.0.6mlnx1-OFED.3.1.1.0.0.x86_64
libmlx5-devel-1.0.2mlnx1-OFED.3.1.1.0.3.x86_64

Regards,
/Bruce


[dpdk-dev] [RFC ][PATCH] Introduce RTE_ARCH_STRONGLY_ORDERED_MEM_OPS configuration parameter

2015-11-03 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> Sent: Tuesday, November 3, 2015 4:53 PM
> To: Ananyev, Konstantin 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC ][PATCH] Introduce
> RTE_ARCH_STRONGLY_ORDERED_MEM_OPS configuration parameter
> 
> On Tue, Nov 03, 2015 at 04:28:00PM +, Ananyev, Konstantin wrote:
> >
> >
> > > -Original Message-
> > > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > > Sent: Tuesday, November 03, 2015 4:19 PM
> > > To: Ananyev, Konstantin
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC ][PATCH] Introduce
> > > RTE_ARCH_STRONGLY_ORDERED_MEM_OPS configuration parameter
> > >
> > > On Tue, Nov 03, 2015 at 03:57:24PM +, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > > > > Sent: Tuesday, November 03, 2015 3:52 PM
> > > > > To: dev at dpdk.org
> > > > > Subject: [dpdk-dev] [RFC ][PATCH] Introduce
> > > > > RTE_ARCH_STRONGLY_ORDERED_MEM_OPS configuration parameter
> > > > >
> > > > > rte_ring implementation needs explicit memory barrier in weakly
> > > > > ordered architecture like ARM unlike strongly ordered
> > > > > architecture like X86
> > > > >
> > > > > Introducing RTE_ARCH_STRONGLY_ORDERED_MEM_OPS configuration to
> > > > > abstract such dependency so that other weakly ordered
> > > > > architectures can reuse this infrastructure.
> > > >
> > > > Looks a bit clumsy.
> > > > Please try to follow this suggestion instead:
> > > > http://dpdk.org/ml/archives/dev/2015-October/025505.html
> > >
> > > Make sense. Do we agree on a macro that is defined based upon
> > > RTE_ARCH_STRONGLY_ORDERED_MEM_OP to remove clumsy #ifdef every where ?
> >
> > Why do we need that macro at all?
> > Why just not have architecture specific macro as was discussed in that
> thread?
> >
> > So for intel somewhere inside
> > lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >
> > it would be:
> >
> > #define rte_smp_wmb()   rte_compiler_barrier()
> >
> > For arm inside lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >
> > #define rte_smp_wmb()   rte_wmb()
> 
> I am not sure about the other architecture but in armv8 device memory
> (typically mapped through NIC PCIe BAR space) are strongly ordered.
> So there is one more dimension to the equation(normal memory or device
> memory).
> IMO rte_smp_wmb() -> rte_wmb() mapping to deal with device memory may not
> be correct on arm64 ?
> 
> Thoughts ?
> 
In cases like that I don't think barriers are needed on any platform so the 
proposed scheme will work fine. It's up the driver writer to know about when 
they are writing to device BARs or not.

/Bruce


[dpdk-dev] [RFC ][PATCH] Introduce RTE_ARCH_STRONGLY_ORDERED_MEM_OPS configuration parameter

2015-11-03 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> Sent: Tuesday, November 3, 2015 4:19 PM
> To: Ananyev, Konstantin 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC ][PATCH] Introduce
> RTE_ARCH_STRONGLY_ORDERED_MEM_OPS configuration parameter
> 
> On Tue, Nov 03, 2015 at 03:57:24PM +, Ananyev, Konstantin wrote:
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > > Sent: Tuesday, November 03, 2015 3:52 PM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [RFC ][PATCH] Introduce
> > > RTE_ARCH_STRONGLY_ORDERED_MEM_OPS configuration parameter
> > >
> > > rte_ring implementation needs explicit memory barrier in weakly
> > > ordered architecture like ARM unlike strongly ordered architecture
> > > like X86
> > >
> > > Introducing RTE_ARCH_STRONGLY_ORDERED_MEM_OPS configuration to
> > > abstract such dependency so that other weakly ordered architectures
> > > can reuse this infrastructure.
> >
> > Looks a bit clumsy.
> > Please try to follow this suggestion instead:
> > http://dpdk.org/ml/archives/dev/2015-October/025505.html
> 
> Make sense. Do we agree on a macro that is defined based upon
> RTE_ARCH_STRONGLY_ORDERED_MEM_OP to remove clumsy #ifdef every where ?
> 
> Jerin

Yes to the single-defined barrier macro.
However, for what controls it, is it really worthwhile defining a new RTE_ 
variable for it? Can we not base it on RTE_ARCH directly?

/Bruce


[dpdk-dev] [PATCH v3] hash: fix scaling by reducing contention

2015-10-30 Thread Richardson, Bruce


> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Friday, October 30, 2015 2:37 PM
> To: dev at dpdk.org
> Cc: Richardson, Bruce ; De Lara Guarch, Pablo
> 
> Subject: [PATCH v3] hash: fix scaling by reducing contention
> 
> From: "De Lara Guarch, Pablo" 
> 
> If using multiple cores on a system with hardware transactional memory
> support, thread scaling does not work, as there was a single point in the
> hash library which is a bottleneck for all threads, which is the
> "free_slots" ring, which stores all the indices of the free slots in the
> table.
> 
> This patch fixes the problem, by creating a local cache per logical core,
> which stores locally indices of free slots, so most times, writer threads
> will not interfere each other.
> 
> Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation")
> 
> Signed-off-by: Pablo de Lara 

Acked-by: Bruce Richardson 


[dpdk-dev] [PATCH v2 2/2] lib/lpm:fix an initialization issue of valid_group in the delete_depth_small()

2015-10-30 Thread Richardson, Bruce
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> 
> 2015-10-30 14:24, Bruce Richardson:
> > On Fri, Oct 30, 2015 at 02:22:27PM +, Bruce Richardson wrote:
> > > On Fri, Oct 30, 2015 at 09:14:39PM +0800, Jijiang Liu wrote:
> > >
> > > Title can be shortened to: "lpm: fix initialization of valid_group
> field"
> > >
> > > > Fixes an initialization issue of 'valid_group' in the
> delete_depth_small function.
> > > >
> > > > In this function, use new rte_lpm_tbl8_entry we call A to replace
> > > > the old rte_lpm_tbl8_entry. But the valid_group do not set VALID, so
> it will be INVALID.
> > > >
> > > > Then when adding a new route which depth is > 24,the tbl8_alloc()
> > > > function will search the rte_lpm_tbl8_entrys to find INVALID
> valid_group, and it will return the A to the add_depth_big function, so
> A's data is overridden.
> > > >
> > >
> > > Not sure this message is entirely clear.
> > > How about:
> > >   When adding an entry to a tbl8, the .valid_group field should always
> be set,
> > >   so that future adds do not accidently find and use this table,
> thinking it is
> > >   currently invalid, i.e. unused, and thereby overwrite existing
> entries.
> > >
> > > > Signed-off-by: NaNa 
> > > >
> > Assuming we get a little cleanup on commit title and log message
> > (Thomas, perhaps just a rewrite on commit?):
> 
> Giving the name of a field in the title is not really useful for the
> overview.
> It's better to talk about the use case which is fixed.

"lpm: fix incorrect reuse of already allocated tbl8" ??


[dpdk-dev] [PATCH v2 01/16] mk: Introduce ARMv7 architecture

2015-10-28 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jan Viktorin
> Sent: Wednesday, October 28, 2015 5:32 PM
> To: David Marchand 
> Cc: dev at dpdk.org; Vlastimil Kosar 
> Subject: Re: [dpdk-dev] [PATCH v2 01/16] mk: Introduce ARMv7 architecture
> 
> On Wed, 28 Oct 2015 14:34:40 +0100
> David Marchand  wrote:
> 
> > On Mon, Oct 26, 2015 at 5:37 PM, Jan Viktorin
> > 
> > wrote:
> >
> > > From: Vlastimil Kosar 
> > >
> > > Make DPDK run on ARMv7-A architecture. This patch assumes ARM
> > > Cortex-A9. However, it is known to be working on Cortex-A7 and
> > > Cortex-A15.
> > >
> > > Signed-off-by: Vlastimil Kosar 
> > > Signed-off-by: Jan Viktorin 
> > > ---
> > > v1 -> v2:
> > > * the -mtune parameter of GCC is configurable now
> > > * the -mfpu=neon can be turned off
> > >
> > > Signed-off-by: Jan Viktorin 
> > > ---
> > >  config/defconfig_arm-armv7-a-linuxapp-gcc | 78
> > > +++
> > >  mk/arch/arm/rte.vars.mk   | 39 
> > >  mk/machine/armv7-a/rte.vars.mk| 67
> ++
> > >  3 files changed, 184 insertions(+)
> > >  create mode 100644 config/defconfig_arm-armv7-a-linuxapp-gcc
> > >  create mode 100644 mk/arch/arm/rte.vars.mk  create mode 100644
> > > mk/machine/armv7-a/rte.vars.mk
> > >
> >
> > This patch comes too early in the patchset, I would put it once
> > compilation is fine (more comment to come, btw), so once all headers
> > are in place, not before.
> 
> Agree, this was done by watching the Power 8 patchset. But this seems
> quite logical.
> 
> >
> > Besides, do we really need this -a suffix ?
> 
> It is the full name of the ARM architecture - armv7, A profile. There are
> 3 profiles: A - application, R - real time, M - microcontroller.
> They differ in what MMU/MPU and other such things they support. But,
> finally, we can omit. The M profile is unsuitable for DPDK anyway. The R
> profile may be however used under certain circumstances (I believe, it can
> run Linux).
> 
If you do want to include the "a" maybe just drop the "-" before it, please. 
The "-" is used to separate the elements of the RTE_TARGET and so it doesn't 
look right.

/Bruce


[dpdk-dev] GIT workflow model to help solve some of the problems

2015-10-22 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wiles, Keith
> Sent: Thursday, October 22, 2015 2:14 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] GIT workflow model to help solve some of the problems
> 
> I have been looking at the a GIT flow model that seems to help with our
> backlog problem and how we stage releases.
> 
> I found this model a few years ago and find it to be very reasonable and
> helps us with development.
> http://nvie.com/posts/a-successful-git-branching-model/
> 
> The model may not be perfect, but it does give us the next branch via the
> develop branch and gets any bugs/patches off the master branch. I have not
> found a simpler solution that does not have problem and this one may have
> problems as well, but it seem to be reasonable. Some call the develop
> branch the ?next? branch, but it does not matter what it is called IMO.
> 
> ?
> Regards,
> ++Keith Wiles
> Intel Corporation

Hi Keith,

I agree that such a git model can work very well, and indeed we used to use 
something a bit like it internally on DPDK development in the pre-dpdk.org era. 
However, I don't think a branching model will solve the patch backlog issues we 
are currently having on dpdk.org, as the branching strategy can't possibly 
affect the speed at which reviews of patches get done, or how quickly reviewed 
patches get applied to the mainline for the next release. I believe that our 
problems are more caused by the limitation of 24 hours in a day, than in a lack 
of branches (or whole trees) in git. :-)

To try and solve the backlog, it's been discussed trying to scale things by 
increasing the number of committers on the project - specifically sub-tree 
maintainers - so as to reduce Thomas's direct workload. Thomas was hinting 
repeatedly at the userspace conference that he'd like me to take over as 
committer for a separate drivers/net subtree, and (following some arm-twisting 
from various sources :-)) I've decided that I do indeed need to take on such a 
role starting from the 2.3 release. [Declan has similarly already volunteered 
to maintain a subtree for crypto drivers.] I'm sure I'll learn soon enough what 
I've let myself in for! :-)

With regards to branches, I believe at userspace Stephen H had volunteered some 
time/resources to see about back-porting fixes to earlier releases, which 
presumably necessitates separate release branches like that in the link you 
sent. Stephen, any follow-up on that - is it still a possibility?

Regards,
/Bruce


[dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD

2015-10-19 Thread Richardson, Bruce


> -Original Message-
> From: Panu Matilainen [mailto:pmatilai at redhat.com]
> Sent: Monday, October 19, 2015 2:26 PM
> To: Tetsuya Mukawa ; Richardson, Bruce
> ; Loftus, Ciara 
> Cc: dev at dpdk.org; ann.zhuangyanying at huawei.com
> Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
> 
> On 10/19/2015 01:50 PM, Tetsuya Mukawa wrote:
> > On 2015/10/19 18:45, Bruce Richardson wrote:
> >> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
> >>>> On 2015/10/16 21:52, Bruce Richardson wrote:
> >>>>> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
> >>>>>> The patch introduces a new PMD. This PMD is implemented as thin
> >>>> wrapper
> >>>>>> of librte_vhost. It means librte_vhost is also needed to compile
> the PMD.
> >>>>>> The PMD can have 'iface' parameter like below to specify a path
> >>>>>> to
> >>>> connect
> >>>>>> to a virtio-net device.
> >>>>>>
> >>>>>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> >>>>>>
> >>>>>> To connect above testpmd, here is qemu command example.
> >>>>>>
> >>>>>> $ qemu-system-x86_64 \
> >>>>>>  
> >>>>>>  -chardev socket,id=chr0,path=/tmp/sock0 \
> >>>>>>  -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
> >>>>>>  -device virtio-net-pci,netdev=net0
> >>>>>>
> >>>>>> Signed-off-by: Tetsuya Mukawa 
> >>>>> With this PMD in place, is there any need to keep the existing
> >>>>> vhost library around as a separate entity? Can the existing
> >>>>> library be
> >>>> subsumed/converted into
> >>>>> a standard PMD?
> >>>>>
> >>>>> /Bruce
> >>>> Hi Bruce,
> >>>>
> >>>> I concern about whether the PMD has all features of librte_vhost,
> >>>> because librte_vhost provides more features and freedom than ethdev
> >>>> API provides.
> >>>> In some cases, user needs to choose limited implementation without
> >>>> librte_vhost.
> >>>> I am going to eliminate such cases while implementing the PMD.
> >>>> But I don't have strong belief that we can remove librte_vhost now.
> >>>>
> >>>> So how about keeping current separation in next DPDK?
> >>>> I guess people will try to replace librte_vhost to vhost PMD,
> >>>> because apparently using ethdev APIs will be useful in many cases.
> >>>> And we will get feedbacks like "vhost PMD needs to support like this
> usage".
> >>>> (Or we will not have feedbacks, but it's also OK.) Then, we will be
> >>>> able to merge librte_vhost and vhost PMD.
> >>> I agree with the above. One the concerns I had when reviewing the
> patch was that the PMD removes some freedom that is available with the
> library. Eg. Ability to implement the new_device and destroy_device
> callbacks. If using the PMD you are constrained to the implementations of
> these in the PMD driver, but if using librte_vhost, you can implement your
> own with whatever functionality you like - a good example of this can be
> seen in the vhost sample app.
> >>> On the other hand, the PMD is useful in that it removes a lot of
> complexity for the user and may work for some more general use cases. So I
> would be in favour of having both options available too.
> >>>
> >>> Ciara
> >>>
> >> Thanks.
> >> However, just because the libraries are merged does not mean that you
> >> need be limited by PMD functionality. Many PMDs provide additional
> >> library-specific functions over and above their PMD capabilities. The
> >> bonded PMD is a good example here, as it has a whole set of extra
> >> functions to create and manipulate bonded devices - things that are
> >> obviously not part of the general ethdev API. Other vPMDs similarly
> include functions to allow them to be created on the fly too.
> >>
> >> regards,
> >> /Bruce
> >
> > Hi Bruce,
> >
> > I appreciate for showing a good example. I haven't noticed the PMD.
> > I will check the bonding PMD, and try to remove librte_vhost without
> > losing freedom and features of the library.
> 
> Hi,
> 
> Just a gentle reminder - if you consider removing (even if by just
> replacing/renaming) an entire library, it needs to happen the ABI
> deprecation process.
> 
> It seems obvious enough. But for all the ABI policing here, somehow we all
> failed to notice the two compatibility breaking rename-elephants in the
> room during 2.1 development:
> - libintel_dpdk was renamed to libdpdk
> - librte_pmd_virtio_uio was renamed to librte_pmd_virtio
> 
> Of course these cases are easy to work around with symlinks, and are
> unrelated to the matter at hand. Just wanting to make sure such things
> dont happen again.
> 
>   - Panu -

Still doesn't hurt to remind us, Panu! Thanks. :-)


[dpdk-dev] [PATCH] crc: deinline crc functions

2015-10-13 Thread Richardson, Bruce
> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Friday, October 2, 2015 12:38 AM
> To: Richardson, Bruce; De Lara Guarch, Pablo
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [PATCH] crc: deinline crc functions
> 
> Because the CRC functions are inline and defined purely in the header
> file, every component that uses these functions gets its own copy
> of the software CRC table which is a big space waster.
> 
> Just deinline which give better long term ABI stablity anyway.
> 
> Signed-off-by: Stephen Hemminger 

While I think it's a good idea to de-inline the functions that
do the calculations using the lookup tables, I think the
functions that consist of a single assembly instruction should be
kept as inline. 

/Bruce


[dpdk-dev] propose a solution for mapping same virtual address space to asymmetric processes

2015-10-13 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Nissim Nisimov
> Sent: Tuesday, October 13, 2015 4:40 PM
> To: 'dev at dpdk.org'
> Subject: [dpdk-dev] propose a solution for mapping same virtual address
> space to asymmetric processes
> 
> Hi all,
> 
> The below will try to suggest a modification to the initialization of
> Environment Abstraction Layer (AKA EAL) so it will be able to allocate
> memory zones from same virtual memory addresses even if the primary
> process is not similar to the secondary processes.
> 
> Problem:
> The DPDK Primary/Secondary model requires that the exact same hugepage
> memory mappings be present in all applications.
> An issue may occur when the Primary and secondary processes are not
> symmetric in such way that the code has big differences (for example,
> Primary process is a traffic distributer and secondary is a worker).
> The result may be that specific virtual address region in the first
> process won't be available in the second process.
> 
> 
> Suggested solution:
> Map all related rte and uio sections somewhere close to the end of huge
> pages memory (that mean rte_eal_memory_init() should be called before
> rte_config_init() in primary process) According to our observations there
> will be more probability to success when allocating the above sections
> after huge pages section (actually uio is already allocated after the huge
> pages area)
> 
> It solved our problem when trying to work with a primary traffic
> distributer which is a very "light" process and few secondary worker
> processes.
> 
> 
> Please share your thoughts on this before I will try to commit our patch
> for review
> 
> Thanks,
> Nissim

Hi,

out of interest, have you tried fixing the issue using the "--base-virtaddr" 
EAL flag to hint a base address to the primary process? It was put into the 
code some time ago to help solve exactly this problem.

/Bruce


[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name

2015-10-02 Thread Richardson, Bruce
> -Original Message-
> From: Charles (Chas) Williams [mailto:3chas3 at gmail.com]
> 
> On Fri, 2015-10-02 at 16:18 +0100, Bruce Richardson wrote:
> > On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote:
> > > If a system is using deterministic interface names, it may be easier
> > > in some cases to use the interface name to blacklist an interface.
> > >
> >
> > Is it possible to do this using the existing arguments, i.e. have the
> > -b flag detect if it's a pci address or name automatically, rather
> > than having to use a separate command-line arg for it?
> 
> You might be able to distinguish names by context.  I doubt interface
> names ever look like PCI addresses.  But that's going to be a bigger
> change since -b will need to be updated to 'blacklist' intead of 'pci-
> blacklist' to prevent confusion.  Or do you just want to overload '-b' and
> keep both long options?
>
I'm not sure about that, to be honest. However, I'd rather not have
too many cmd line options to be maintained in the code. 

Does you proposed blacklisting patch work with non-pci devices as well
as with PCI ones as now?

/Bruce


[dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic bind files

2015-10-02 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen
> Sent: Friday, October 2, 2015 11:50 AM
> To: Arevalo, Mario Alfredo C; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic
> bind files
> 
> On 10/01/2015 03:11 AM, Mario Carrillo wrote:
> > Add hierarchy-file support to the DPDK nic bind files, when invoking
> > "make install-sbin" nic bind files will be installed by default in:
> > $(DESTDIR)/$(SBIN_DIR) where SBIN_DIR=/usr/sbin/dpdk_nic_bind by
> > default, you can override SBIN_DIR var.
> > This hierarchy is based on:
> > http://www.freedesktop.org/software/systemd/man/file-hierarchy.html
> > and dpdk spec file.
> >
> > Signed-off-by: Mario Carrillo 
> > ---
> >   mk/rte.sdkinstall.mk | 14 ++
> >   mk/rte.sdkroot.mk|  4 ++--
> >   2 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk index
> > 5a2fd40..4eecf31 100644
> > --- a/mk/rte.sdkinstall.mk
> > +++ b/mk/rte.sdkinstall.mk
> > @@ -46,11 +46,13 @@ else
> >   INCLUDE_DIR ?= /usr/include/dpdk
> >   BIN_DIR ?= /usr/bin
> >   DOC_DIR ?= /usr/share/doc/dpdk
> > +SBIN_DIR ?= /usr/sbin/dpdk_nic_bind
> >   HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
> >   BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*))
> >   LIBS := $(wildcard $(RTE_OUTPUT)/lib/*)
> >   MODULES := $(wildcard $(RTE_OUTPUT)/kmod/*)
> >   DOCS := $(wildcard $(BUILD_DIR)/doc/*)
> > +NIC_BIND_FILES := $(wildcard $(BUILD_DIR)/tools/*nic_bind.py)
> >   include $(BUILD_DIR)/build/.config
> >   RTE_ARCH := $(CONFIG_RTE_ARCH:"%"=%)
> >   RTE_EXEC_ENV := $(CONFIG_RTE_EXEC_ENV:"%"=%) @@ -161,6 +163,18 @@
> > install-doc:
> > echo installing: $$DOC; \
> > done
> >   #
> > +# install nic bind files in /usr/sbin/dpdk_nic_bind # by default
> > +SBIN_DIR can be overridden.
> > +#
> 
> This creates an out-of-path directory /usr/sbin/dpdk_nic_bind/ in which
> the dpdk_nic_bind.py is installed. Besides not being a very accessible
> location, the FHS explicitly forbids creation of subdirectories below
> /usr/[s]bin.
> 
> SBIN_DIR should be /usr/sbin unless overridden, but OTOH I think this
> could go into /usr/bin just as well, the split is fairly ambiguous anyway
> (I mean, testpmd is not something a regular user is going to run
> either)
> 
> In addition, if dpdk_nic_bind.py is installed then perhaps the
> cpu_layout.py utility should be installed too?
> 
>   - Panu -

I think there are better utilities available for determining the core layout
that cpu_layout.py. "lstopo", for one, is much more powerful. Do we want/need
to keep our own script around for that?

/Bruce


[dpdk-dev] No egressing packet

2015-09-16 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wu, Yiwen
> Sent: Wednesday, September 16, 2015 2:31 PM
> To: Thomas Monjalon
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] No egressing packet
> 
> Thomas,
> 
> I am using rte_eth_tx_burst to send packets. I also use
> rte_eth_add_tx_callback to register a callback to figure out what has been
> sent.
> 

I would suggest checking that the source mac address is correct for all
outgoing frames as packets with an incorrect source are likely to be blocked
due to security (spoofing) concerns.

/Bruce

> Thanks,
> 
> Yiwen
> 
> On 9/16/2015 3:32 AM, Thomas Monjalon wrote:
> > 2015-09-15 17:10, Wu, Yiwen:
> >> Hi all,
> >>
> >> I am new to dpdk. I am running a single forwarding program based on
> >> dpdk 2.1.0. The program runs on a VM, binding on two interfaces. All
> >> it's doing is to forward packets from one interface to another. All
> >> ingressing packets are fine but there seems no egressing packets. I
> >> used rte_eth_add_tx_callback to register a tx callback. The callback
> >> is able to print the right egress packet but the destination is just
> >> not receiving it (via tcpdump). Does anybody have the similar
> experience?
> >> Any solution or hints will be great.
> > You need to call rte_eth_tx_burst() instead of
> rte_eth_add_tx_callback().
> > For more information, please check the guide:
> >
> > http://dpdk.org/doc/guides/prog_guide/poll_mode_drv.html#poll-mode-dri
> > ver You can also check this basic example:
> > http://dpdk.org/browse/dpdk/tree/examples/skeleton/basicfwd.c



[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-11 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladislav Zolotarov
> Sent: Friday, September 11, 2015 5:04 PM
> To: Avi Kivity
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1
> for all NICs but 82598
> 
> On Sep 11, 2015 6:43 PM, "Avi Kivity"  wrote:
> >
> > On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote:
> >>
> >>
> >> On Sep 11, 2015 5:55 PM, "Thomas Monjalon"
> >> 
> wrote:
> >> >
> >> > 2015-09-11 17:47, Avi Kivity:
> >> > > On 09/11/2015 05:25 PM, didier.pallard wrote:
> >> > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote:
> >> > > >>
> >> > > >> Helin, the issue has been seen on x540 devices. Pls., see a
> chapter
> >> > > >> 7.2.1.1 of x540 devices spec:
> >> > > >>
> >> > > >> A packet (or multiple packets in transmit segmentation) can
> >> > > >> span
> any
> >> > > >> number of
> >> > > >> buffers (and their descriptors) up to a limit of 40 minus
> >> > > >> WTHRESH minus 2 (see Section 7.2.3.3 for Tx Ring details and
> >> > > >> section Section 7.2.3.5.1
> for
> >> > > >> WTHRESH
> >> > > >> details). For best performance it is recommended to minimize
> >> > > >> the number of buffers as possible.
> >> > > >>
> >> > > >> Could u, pls., clarify why do u think that the maximum number
> >> > > >> of
> data
> >> > > >> buffers is limited by 8?
> >> > > >>
> >> > > >> thanks,
> >> > > >> vlad
> >> > > >
> >> > > > Hi vlad,
> >> > > >
> >> > > > Documentation states that a packet (or multiple packets in
> >> > > > transmit
> >> > > > segmentation) can span any number of buffers (and their
> >> > > > descriptors) up to a limit of 40 minus WTHRESH minus 2.
> >> > > >
> >> > > > Shouldn't there be a test in transmit function that drops
> >> > > > properly
> the
> >> > > > mbufs with a too large number of segments, while incrementing a
> >> > > > statistic; otherwise transmit
> function
> >> > > > may be locked by the faulty packet without notification.
> >> > > >
> >> > >
> >> > > What we proposed is that the pmd expose to dpdk, and dpdk expose
> >> > > to
> the
> >> > > application, an mbuf check function.  This way applications that
> >> > > can generate complex packets can verify that the device will be
> >> > > able to process them, and applications that only generate simple
> >> > > mbufs can
> avoid
> >> > > the overhead by not calling the function.
> >> >
> >> > More than a check, it should be exposed as a capability of the port.
> >> > Anyway, if the application sends too much segments, the driver must
> >> > drop it to avoid hang, and maintain a dedicated statistic counter
> >> > to
> allow
> >> > easy debugging.
> >>
> >> I agree with Thomas - this should not be optional. Malformed packets
> should be dropped. In the icgbe case it's a very simple test - it's a
> single branch per packet so i doubt that it could impose any measurable
> performance degradation.
> >>
> >>
> >
> > A drop allows the application no chance to recover.  The driver must
> either provide the ability for the application to know that it cannot
> accept the packet, or it must fix it up itself.
> 
> An appropriate statistics counter would be a perfect tool to detect such
> issues. Knowingly sending a packet that will cause a HW to hang is not
> acceptable.

I would agree. Drivers should provide a function to query the max number of
segments they can accept and the driver should be able to discard any packets
exceeding that number, and just track it via a stat.

/Bruce


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-09-11 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladislav Zolotarov
> Sent: Friday, September 11, 2015 4:13 PM
> To: Thomas Monjalon
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1
> for all NICs but 82598
> 
> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" 
> wrote:
> >
> > 2015-09-11 17:47, Avi Kivity:
> > > On 09/11/2015 05:25 PM, didier.pallard wrote:
> > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote:
> > > >>
> > > >> Helin, the issue has been seen on x540 devices. Pls., see a
> > > >> chapter
> > > >> 7.2.1.1 of x540 devices spec:
> > > >>
> > > >> A packet (or multiple packets in transmit segmentation) can span
> > > >> any number of buffers (and their descriptors) up to a limit of 40
> > > >> minus WTHRESH minus 2 (see Section 7.2.3.3 for Tx Ring details
> > > >> and section Section 7.2.3.5.1 for WTHRESH details). For best
> > > >> performance it is recommended to minimize the number of buffers
> > > >> as possible.
> > > >>
> > > >> Could u, pls., clarify why do u think that the maximum number of
> > > >> data buffers is limited by 8?
> > > >>
> > > >> thanks,
> > > >> vlad
> > > >
> > > > Hi vlad,
> > > >
> > > > Documentation states that a packet (or multiple packets in
> > > > transmit
> > > > segmentation) can span any number of buffers (and their
> > > > descriptors) up to a limit of 40 minus WTHRESH minus 2.
> > > >
> > > > Shouldn't there be a test in transmit function that drops properly
> > > > the mbufs with a too large number of segments, while incrementing
> > > > a statistic; otherwise transmit function may be locked by the
> > > > faulty packet without notification.
> > > >
> > >
> > > What we proposed is that the pmd expose to dpdk, and dpdk expose to
> > > the application, an mbuf check function.  This way applications that
> > > can generate complex packets can verify that the device will be able
> > > to process them, and applications that only generate simple mbufs
> > > can avoid the overhead by not calling the function.
> >
> > More than a check, it should be exposed as a capability of the port.
> > Anyway, if the application sends too much segments, the driver must
> > drop it to avoid hang, and maintain a dedicated statistic counter to
> > allow easy debugging.
> 
> I agree with Thomas - this should not be optional. Malformed packets
> should be dropped. In the icgbe case it's a very simple test - it's a
> single branch per packet so i doubt that it could impose any measurable
> performance degradation.
> 
Actually, it could very well do - we'd have to test it. For the vector IO
paths, every additional cycle in the RX or TX paths causes a noticeable perf
drop.

/Bruce


[dpdk-dev] [PATCH] ixgbe: prefetch packet headers in vector PMD receive function

2015-09-07 Thread Richardson, Bruce


> -Original Message-
> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> Sent: Monday, September 7, 2015 3:15 PM
> To: Richardson, Bruce; dev at dpdk.org
> Cc: Ananyev, Konstantin
> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive
> function
> 
> 
> 
> On 07/09/15 13:57, Richardson, Bruce wrote:
> >
> >
> >> -Original Message-
> >> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> >> Sent: Monday, September 7, 2015 1:26 PM
> >> To: dev at dpdk.org
> >> Cc: Ananyev, Konstantin; Richardson, Bruce
> >> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD
> >> receive function
> >>
> >> Hi,
> >>
> >> I just realized I've missed the "[PATCH]" tag from the subject. Did
> >> anyone had time to review this?
> >>
> >
> > Hi Zoltan,
> >
> > the big thing that concerns me with this is the addition of new
> > instructions for each packet in the fast path. Ideally, this
> > prefetching would be better handled in the application itself, as for
> > some apps, e.g. those using pipelining, the core doing the RX from the
> > NIC may not touch the packet data at all, and the prefetches will
> instead cause a performance slowdown.
> >
> > Is it possible to get the same performance increase - or something
> > close to it - by making changes in OVS?
> 
> OVS already does a prefetch when it's processing the previous packet, but
> apparently it's not early enough. At least for my test scenario, where I'm
> forwarding UDP packets with the least possible overhead. I guess in tests
> where OVS does more complex processing it should be fine.
> I'll try to move the prefetch earlier in OVS codebase, but I'm not sure if
> it'll help.

I would suggest trying to prefetch more than one packet ahead. Prefetching 4 or
8 ahead might work better, depending on the processing being done.

/Bruce


[dpdk-dev] [PATCH] ixgbe: prefetch packet headers in vector PMD receive function

2015-09-07 Thread Richardson, Bruce


> -Original Message-
> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> Sent: Monday, September 7, 2015 1:26 PM
> To: dev at dpdk.org
> Cc: Ananyev, Konstantin; Richardson, Bruce
> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive
> function
> 
> Hi,
> 
> I just realized I've missed the "[PATCH]" tag from the subject. Did anyone
> had time to review this?
> 

Hi Zoltan,

the big thing that concerns me with this is the addition of new instructions for
each packet in the fast path. Ideally, this prefetching would be better handled
in the application itself, as for some apps, e.g. those using pipelining, the
core doing the RX from the NIC may not touch the packet data at all, and the
prefetches will instead cause a performance slowdown.

Is it possible to get the same performance increase - or something close to it -
by making changes in OVS?

Regards,
/Bruce

> Regards,
> 
> Zoltan
> 
> On 01/09/15 20:17, Zoltan Kiss wrote:
> > The lack of this prefetch causes a significant performance drop in
> > OVS-DPDK: 13.3 Mpps instead of 14 when forwarding 64 byte packets.
> > Even though OVS prefetches the next packet's header before it starts
> > processing the current one, it doesn't get there fast enough. This
> > aligns with the behaviour of other receive functions.
> >
> > Signed-off-by: Zoltan Kiss 
> > ---
> > diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > index cf25a53..51299fa 100644
> > --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > @@ -502,6 +502,15 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq,
> struct rte_mbuf **rx_pkts,
> >  _mm_storeu_si128((void *)_pkts[pos]-
> >rx_descriptor_fields1,
> >  pkt_mb1);
> >
> > +   rte_packet_prefetch((char*)(rx_pkts[pos]->buf_addr) +
> > +   RTE_PKTMBUF_HEADROOM);
> > +   rte_packet_prefetch((char*)(rx_pkts[pos + 1]->buf_addr)
> +
> > +   RTE_PKTMBUF_HEADROOM);
> > +   rte_packet_prefetch((char*)(rx_pkts[pos + 2]->buf_addr)
> +
> > +   RTE_PKTMBUF_HEADROOM);
> > +   rte_packet_prefetch((char*)(rx_pkts[pos + 3]->buf_addr)
> +
> > +   RTE_PKTMBUF_HEADROOM);
> > +
> >  /* C.4 calc avaialbe number of desc */
> >  var = __builtin_popcountll(_mm_cvtsi128_si64(staterr));
> >  nb_pkts_recd += var;
> >


[dpdk-dev] [RFC PATCH 1/6] librte_ether: add fields from rte_pci_driver to rte_eth_dev and rte_eth_dev_data.

2015-09-01 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Tuesday, September 1, 2015 1:37 PM
> To: Iremonger, Bernard; Thomas Monjalon
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 1/6] librte_ether: add fields from
> rte_pci_driver to rte_eth_dev and rte_eth_dev_data.
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Iremonger,
> > Bernard
> > Sent: Tuesday, September 01, 2015 12:39 PM
> > To: Thomas Monjalon
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 1/6] librte_ether: add fields from
> rte_pci_driver to rte_eth_dev and rte_eth_dev_data.
> >
> > Hi THomas,
> >
> > 
> >
> > > > @@ -424,7 +425,10 @@ rte_eth_dev_socket_id(uint8_t port_id)  {
> > > > if (!rte_eth_dev_is_valid_port(port_id))
> > > > return -1;
> > > > -   return rte_eth_devices[port_id].pci_dev->numa_node;
> > > > +   if (rte_eth_devices[port_id].dev_type == RTE_ETH_DEV_PCI)
> > > > +   return rte_eth_devices[port_id].pci_dev->numa_node;
> > > > +   else
> > > > +   return rte_eth_devices[port_id].data->numa_node;
> > >
> > > Clearly not the way to go.
> > > We should remove the special cases (PCI, PDEV, VDEV) instead of
> > > adding more checks.
> >
> > The dev_type field is not new, just using it now to distinguish between
> PCI and non PCI devices.
> >
> > I have moved some of the RTE_PCI_DRV flags to a new dev_flags field in
> > struct rte_eth_dev{}, These flags are independent of the driver type
> (PCI or non PCI).
> > Do you have view on this new dev_flags field and macros?
> 
> What looks strange here to me, I that we now we duplicate some fields
> here?
> Let say for PCI devices numa_node would be precent in pci_dev and in data,
> right?
> If there are some fields that are common for all device types (PCI, VDEV,
> etc) why not to create some unified structure for them that would be used
> by all device types and remove subsequent fields from pci_dev?
> Or alternatively create a union of structs (one struct per device type)?
> Then, as was pointed before, we wouldn't these branches by device_type at
> all.
> Konstantin
> 

I wouldn't worry so much about the duplicated data, so long as the ethdev APIs 
only
have to look for the data in a single place when necessary. [Some data may be 
naturally
present in the pci_dev structure, for instance, and then be copied by the 
driver into
the common ethdev structure. If however, that copy and duplication can be 
avoided,
great!]

/Bruce


[dpdk-dev] [PATCH v2] ixgbe: fix check for split packets

2015-07-22 Thread Richardson, Bruce


> -Original Message-
> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> Sent: Wednesday, July 22, 2015 2:20 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] ixgbe: fix check for split packets
> 
> 
> 
> On 22/07/15 10:59, Bruce Richardson wrote:
> > On Wed, Jul 22, 2015 at 10:47:34AM +0100, Zoltan Kiss wrote:
> >> Hi,
> >>
> >> And what happens if someone changes RTE_IXGBE_VPMD_RX_BURST to
> >> something else than 32? I guess this bug were introduced when someone
> >> raised it from
> >> 16 to 32
> >
> > Actually, no, this bug is purely due to me getting my maths wrong when
> > I wrote this function. The vector PMD has always worked in bursts of
> > 32 at a time.
> >
> >> I think you are better off with a for loop which uses this value. Or
> >> at least make a comment around RTE_IXGBE_VPMD_RX_BURST that if you
> >> change that value this check should be modified as well.
> >
> > The vector PMD always works off a fixed 32 burst size. Any change to
> > that will lead to many changes in the code, so I don't believe a loop is
> necessary.
> 
> Ok, then I suggest to make a comment around RTE_IXGBE_VPMD_RX_BURST that
> changing it needs a lot of other changes in the code elsewhere, e.g in
> this split_flags check.
> Btw. vPMD was a bit misleading abbreviation for me, it took me a while
> until I realized 'v' stands for 'vector', not 'virtualization' as in most
> cases nowadays.

Good idea. I'll try to submit a patch to add a comment if I get the chance - 
otherwise feel free to do so yourself.

As for the naming, yes, I suppose it can be confusing. :-) We possibly need to 
start calling these pieces of code SSE or AVX rather than just vector, since 
for some we may end up with multiple vector versions.

/Bruce

> 
> >
> > Regards,
> > /Bruce
> >
> >>
> >> Regards,
> >>
> >> Zoltan
> >>
> >> On 22/07/15 10:13, Bruce Richardson wrote:
> >>> The check for split packets to be reassembled in the vector ixgbe
> >>> PMD was incorrectly only checking the first 16 elements of the array
> >>> instead of all 32. This is fixed by changing the uint32_t values to
> >>> be uint64_t instead.
> >>>
> >>> Fixes: cf4b4708a88a ("ixgbe: improve slow-path perf with vector
> >>> scattered Rx")
> >>>
> >>> Reported-by: Zoltan Kiss 
> >>> Signed-off-by: Bruce Richardson 
> >>>
> >>> ---
> >>> V2: Rename variable from split_fl32 to split_fl64 to match type
> change.
> >>> ---
> >>>   drivers/net/ixgbe/ixgbe_rxtx_vec.c | 6 +++---
> >>>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>> index d3ac74a..f2052c6 100644
> >>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>> @@ -549,10 +549,10 @@ ixgbe_recv_scattered_pkts_vec(void *rx_queue,
> struct rte_mbuf **rx_pkts,
> >>>   return 0;
> >>>
> >>>   /* happy day case, full burst + no packets to be joined */
> >>> - const uint32_t *split_fl32 = (uint32_t *)split_flags;
> >>> + const uint64_t *split_fl64 = (uint64_t *)split_flags;
> >>>   if (rxq->pkt_first_seg == NULL &&
> >>> - split_fl32[0] == 0 && split_fl32[1] == 0 &&
> >>> - split_fl32[2] == 0 && split_fl32[3] == 0)
> >>> + split_fl64[0] == 0 && split_fl64[1] == 0 &&
> >>> + split_fl64[2] == 0 && split_fl64[3] == 0)
> >>>   return nb_bufs;
> >>>
> >>>   /* reassemble any packets that need reassembly*/
> >>>


[dpdk-dev] [PATCH 2/2] ixgbe: check mbuf refcnt when clearing RX/TX ring

2015-07-20 Thread Richardson, Bruce


> -Original Message-
> From: Ananyev, Konstantin
> Sent: Monday, July 20, 2015 10:37 AM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 2/2] ixgbe: check mbuf refcnt when clearing
> RX/TX ring
> 
> Hi Bruce,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Friday, July 03, 2015 4:40 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH 2/2] ixgbe: check mbuf refcnt when clearing
> > RX/TX ring
> >
> > The function to clear the TX ring when a port was being closed, e.g.
> > on exit in testpmd, was not checking the mbuf refcnt before freeing it.
> > Since the function in the vector driver to clear the ring after TX
> > does not set the pointer to NULL post-free, this caused crashes if
> > mbuf debugging was turned on.
> >
> > To reproduce the issue, ensure the follow config variables are set:
> > RTE_IXGBE_INC_VECTOR
> > RTE_LIBRTE_MBUF_DEBUG
> > Then compile up and run testpmd using 10G ports with the vector driver.
> > Start traffic and let some flow through, then type "stop" and "quit"
> > at the testpmd prompt, and crash will occur. Output below:
> >
> > testpmd> quit
> > Stopping port 0...done
> > Stopping port 1...PANIC in rte_mbuf_sanity_check():
> > bad ref cnt
> > [New Thread 0x7fffabfff700 (LWP 145312)]
> > [New Thread 0x7fffb47fe700 (LWP 145311)]
> > [New Thread 0x7fffb4fff700 (LWP 145310)]
> > [New Thread 0x76cd5700 (LWP 145309)]
> > 18: [/home/bruce/dpdk.org/x86_64-native-linuxapp-
> gcc/app/testpmd(_start+0x29)
> > <snip for brevity...>
> > Program received signal SIGABRT, Aborted.
> > 0x77120a98 in raise () from /lib64/libc.so.6
> >
> > A similar error occurs when clearing the RX ring, which is also fixed
> > by this patch.
> >
> > Signed-off-by: Bruce Richardson 
> > ---
> >  drivers/net/ixgbe/ixgbe_rxtx.c | 3 ++-
> >  drivers/net/ixgbe/ixgbe_rxtx_vec.c | 8 +++-
> >  2 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c
> > b/drivers/net/ixgbe/ixgbe_rxtx.c index 41a062e..12e25b7 100644
> > --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > @@ -2108,7 +2108,8 @@ ixgbe_rx_queue_release_mbufs(struct
> > ixgbe_rx_queue *rxq)
> >
> > if (rxq->sw_ring != NULL) {
> > for (i = 0; i < rxq->nb_rx_desc; i++) {
> > -   if (rxq->sw_ring[i].mbuf != NULL) {
> > +   if (rxq->sw_ring[i].mbuf != NULL &&
> > +   
> > rte_mbuf_refcnt_read(rxq->sw_ring[i].mbuf))
> {
> > rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
> > rxq->sw_ring[i].mbuf = NULL;
> > }
> 
> 
> Sorry for late review, but I am afraid your changes don't fix the problem.
> After sw_ring[].mbuf was freed by RX path, entity that manages that RX
> queue shouldn't touch that mbuf.
> (unless it was allocated  by the same RX queue again).
> As same mbuf could be already allocated by something else.
> As an example by another RX/TX queue and is in active use.
> Same story for TX below.

Good point, I'd forgotten about that scenario.

> 
> As I can see the proper fix could be one of 2:
> 1. Make RX/TX vector functions to reset sw_ring[].mbuf to 0.
> 2. At queue_release_mbufs(), don't go through all sw_ring[] entries, but
> only though ones which might contain valid mbufs.
> For RX: entries between rx_tail and rxrearm_start only (which implies a
> special queue_release_mbufs() for vector rx).
> For TX: from tx_next_dd - (tx_rs_thresh - 1) and no more then nb_tx_desc -
> nb_tx_free

Option 2 seems a better choice for a fix. I'll look at it when I get a chance.

/Bruce

> 
> Konstantin
> 
> > diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > index 0edac82..7e633d3 100644
> > --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > @@ -665,7 +665,13 @@ ixgbe_tx_queue_release_mbufs(struct ixgbe_tx_queue
> *txq)
> >  nb_free < max_desc && i != txq->tx_tail;
> >  i = (i + 1) & max_desc) {
> > txe = (struct ixgbe_tx_entry_v *)>sw_ring[i];
> > -   if (txe->mbuf != NULL)
> > +   /*
> > +*check for already freed packets.
> > +* Note: ixgbe_tx_free_bufs does not NULL after free,
> > +* so we actually have to check the reference count.
> > +*/
> > +   if (txe->mbuf != NULL &&
> > +   rte_mbuf_refcnt_read(txe->mbuf) != 0)
> > rte_pktmbuf_free_seg(txe->mbuf);
> > }
> > /* reset tx_entry */
> > --
> > 2.4.3



[dpdk-dev] [PATCH] Add unit test for thash library

2015-06-19 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladimir Medvedkin
> Sent: Friday, June 19, 2015 3:56 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] Add unit test for thash library
> 
> Add unit test for thash library
> 
Missing sign-off.

> ---
>  app/test/Makefile |   2 +
>  app/test/autotest_data.py |  13 
>  app/test/test_thash.c | 164
> ++
>  3 files changed, 179 insertions(+)
>  create mode 100644 app/test/test_thash.c
> 
> diff --git a/app/test/Makefile b/app/test/Makefile
> index 5cf8296..fc6a247 100644
> --- a/app/test/Makefile
> +++ b/app/test/Makefile
> @@ -85,6 +85,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash.c
>  SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c
>  SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_functions.c
> 
> +SRCS-y += test_thash.c
> +
>  SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm.c
>  SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm6.c
> 
> diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
> index 0c3802b..7653f09 100644
> --- a/app/test/autotest_data.py
> +++ b/app/test/autotest_data.py
> @@ -475,6 +475,19 @@ non_parallel_test_group_list = [
>   },
>   ]
>  },
> +{
> + "Prefix" :  "thash",
> + "Memory" :  "32",
> + "Tests" :
> + [
> + {
> + "Name" :   "Thash autotest",
> + "Command" :"thash_autotest",
> + "Func" :   default_autotest,
> + "Report" : None,
> +},
> + ]
> +},
> 
>  #
>  # Please always make sure that ring_perf is the last test!
> diff --git a/app/test/test_thash.c b/app/test/test_thash.c
> new file mode 100644
> index 000..4c863cc
> --- /dev/null
> +++ b/app/test/test_thash.c
> @@ -0,0 +1,164 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 Vladimir Medvedkin 
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above
> copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
> ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
> USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +//#include 
> +//#include 
> +//#include 
> +#include 
> +//#include 
> +//#include 

Please just delete the commented out lines, there is no need to keep them.

> +
> +#include 
> +#include 
> +#include 
> +
> +#include "test.h"
> +
> +#include 
> +
> +struct test_thash_v4 {
> + uint32_tdst_ip;
> + uint32_tsrc_ip;
> + uint16_tdst_port;
> + uint16_tsrc_port;
> + uint32_thash_l3;
> + uint32_thash_l3l4;
> +};
> +
> +struct test_thash_v6 {
> + uint8_t dst_ip[16];
> + uint8_t src_ip[16];
> + uint16_tdst_port;
> + uint16_tsrc_port;
> + uint32_thash_l3;
> + uint32_thash_l3l4;
> +};
> +
> +/*From 82599 Datasheet p.309 ??7.1.2.8.RSS Verification Suite*/

Strange characters present in the above line (they don't show up for me in mutt 
though).
I'd also suggest dropping the page number, as that can probably change across 
different versions of the datasheet. [I have a (very) old copy of the datasheet 
myself, and it's only on page 248 there. It's obviously time for me to pull 
down an updated copy :-)).

/Bruce


[dpdk-dev] [PATCH v4] Add toeplitz hash algorithm used by RSS

2015-06-19 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladimir Medvedkin
> Sent: Friday, June 19, 2015 3:56 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4] Add toeplitz hash algorithm used by RSS
> 
> v4 changes
> - Fix copyright
> - rename bswap_mask constant, add rte_ prefix
> - change rte_ipv[46]_tuple struct
> - change rte_thash_load_v6_addr prototype
> 
> v3 changes
> - Rework API to be more generic
> - Add sctp_tag into tuple
> 
> v2 changes
> - Add ipv6 support
> - Various style fixes
> 

Missing signoff line.

> ---
>  lib/librte_hash/Makefile|   1 +
>  lib/librte_hash/rte_thash.h | 202
> 
>  2 files changed, 203 insertions(+)
>  create mode 100644 lib/librte_hash/rte_thash.h
> 
<...snip...>
> +
> +/* Byte swap mask used for converting IPv6 address 4-byte chunks to CPU
> byte order */
> +static const __m128i rte_thash_ipv6_bswap_mask = {0x0405060700010203,
> 0x0C0D0E0F08090A0B};
> +
> +#define RTE_THASH_V4_L3   2  /*calculate hash of ipv4 header only*/
> +#define RTE_THASH_V4_L4   3  /*calculate hash of ipv4 + transport
> headers*/
> +#define RTE_THASH_V6_L3   8  /*calculate hash of ipv6 header only
> */
> +#define RTE_THASH_V6_L4   9  /*calculate hash of ipv6 + transport
> headers */

I'm still not seeing why these values need to be defined here, rather than in a 
specific app.
Also, the choice of values for these defines seems strange to me? How were they 
chosen?

/Bruce



[dpdk-dev] [dpdk-announce] important design choices - statistics - ABI

2015-06-17 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> Sent: Wednesday, June 17, 2015 11:35 AM
> To: Thomas Monjalon
> Cc: announce at dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-announce] important design choices -
> statistics - ABI
> 
> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> > Hi all,
> >
> > Sometimes there are some important discussions about architecture or
> > design which require opinions from several developers. Unfortunately,
> > we cannot read every threads. Maybe that using the announce mailing
> > list will help to bring more audience to these discussions.
> > Please note that
> > - the announce@ ML is moderated to keep a low traffic,
> > - every announce email is forwarded to dev@ ML.
> > In case you want to reply to this email, please use dev at dpdk.org
> address.
> >
> > There were some debates about software statistics disabling.
> > Should they be always on or possibly disabled when compiled?
> > We need to take a decision shortly and discuss (or agree) this proposal:
> > http://dpdk.org/ml/archives/dev/2015-June/019461.html
> >
> > During the development of the release 2.0, there was an agreement to
> > keep ABI compatibility or to bring new ABI while keeping old one during
> one release.
> > In case it's not possible to have this transition, the (exceptional)
> > break should be acknowledged by several developers.
> > http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
> > There were some interesting discussions but not a lot of participants:
> >
> > http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus=84
> > 61
> >
> > During the current development cycle for the release 2.1, the ABI
> > question arises many times in different threads.
> > To add the hash key size field, it is proposed to use a struct padding
> gap:
> > http://dpdk.org/ml/archives/dev/2015-June/019386.html
> > To support the flow director for VF, there is no proposal yet:
> > http://dpdk.org/ml/archives/dev/2015-June/019343.html
> > To add the speed capability, it is proposed to break ABI in the release
> 2.2:
> > http://dpdk.org/ml/archives/dev/2015-June/019225.html
> > To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
> > http://dpdk.org/ml/archives/dev/2015-June/019443.html
> > To add the interrupt mode, it is proposed to add a build-time option
> > CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking
> binary:
> > http://dpdk.org/ml/archives/dev/2015-June/018947.html
> > To add the packet type, there is a proposal to add a build-time option
> > CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
> > http://dpdk.org/ml/archives/dev/2015-June/019172.html
> > We must also better document how to remove a deprecated ABI:
> > http://dpdk.org/ml/archives/dev/2015-June/019465.html
> > The ABI compatibility is a new constraint and we need to better
> > understand what it means and how to proceed. Even the macros are not yet
> well documented:
> > http://dpdk.org/ml/archives/dev/2015-June/019357.html
> >
> > Thanks for your attention and your participation in these important
> choices.
> >
> 
> Thomas-
>   Just to re-iterate what you said earlier, and what was discussed in
> the previous ABI discussions
> 
> 1) ABI stability was introduced to promote DPDK's ability to be included
> with various linux and BSD distributions.  Distributions, by and large,
> favor building libraries as DSO's, favoring security and updatability in
> favor of all out performance.
> 
> 2) The desire was to put DPDK developers in a mindset whereby ABI
> stability was something they needed to think about during development, as
> the DPDK exposes many data structures and instances that cannot be changed
> without breaking ABI
> 
> 3) The versioning mechanism was introduced to allow for backward
> compatibility during periods in which we needed to support both an old an
> new ABI
> 
> 4) As Stephan and others point out, its not expected that we will always
> be able to maintain ABI, and as such an easy library versioning mechanism
> was introduced to prevent the loading of an incompatible library with an
> older application
> 
> 5) The ABI policy was introduced to create a method by which new ABI
> facets could be scheduled while allowing distros to prepare their
> downstream users for the upcomming changes.
> 
> 
> It seems to me, looking back over these last few months, that we're
> falling down a bit on our use of (3).  I've seen several people take
> advantage of the ABI scheduled updates, but no one has tried the
> versioning interface, and as a result patches are getting delayed, which
> was never my intent.  Not sure whats to be done about that, but we should
> probably address it.  Is use of the versionnig interface just too hard or
> convoluted?
> 
> Neil


Hi Neil,

on my end, some suggestions:

1. the documentation on changing an API function provided in 

[dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type

2015-05-21 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> Sent: Wednesday, May 20, 2015 7:47 PM
> To: Marc Sune
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type
> 
> On Wed, May 20, 2015 at 07:01:02PM +0200, Marc Sune wrote:
> >
> >
> > On 20/05/15 12:28, Neil Horman wrote:
> > >On Wed, May 20, 2015 at 12:05:00PM +0200, Marc Sune wrote:
> > >>
> > >>On 20/05/15 10:31, Thomas Monjalon wrote:
> > >>>2015-05-19 12:31, Bruce Richardson:
> > On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote:
> > >Hi all,
> > >
> > >after a small amount of offline discussion with Marc Sune, here
> > >is an alternative proposal for a higher-level interface - aka
> > >pktdev - to allow a common Rx/Tx API across device types handling
> > >mbufs [for now, ethdev, ring and KNI]. The key code is in the
> > >first patch fo the set - the second is an example of a trivial
> usecase.
> > >
> > >What is different about this to previously:
> > >* wrapper class, so no changes to any existing ring, ethdev
> > >implementations
> > >* use of function pointers for RX/TX with an API that maps to
> ethdev
> > >   - this means there is little/no additional overhead for ethdev
> calls
> > >   - inline special case for rings, to accelerate that. Since we
> are at a
> > > higher level, we can special case process some things if
> appropriate. This
> > > means the impact to ring ops is one (predictable) branch per
> > >burst
> > >* elimination of the queue abstraction. For the ring and KNI, there
> is no
> > >   concept of queues, so we just wrap the functions directly (no
> need even for
> > >   wrapper functions, the api's match so we can call directly).
> This also
> > >   means:
> > >   - adding in features per-queue, is far easier as we don't need
> to worry about
> > > having arrays of multiple queues. For example:
> > >   - adding in buffering on TX (or RX) is easier since again we
> only have a
> > > single queue.
> > >* thread safety is made easier using a wrapper. For a MP ring, we
> can create
> > >   multiple pktdevs around it, and each thread will then be able to
> use their
> > >   own copy, with their own buffering etc.
> > >
> > >However, at this point, I'm just looking for general feedback on
> > >this as an approach. I think it's quite flexible - even more so
> > >than the earlier proposal we had. It's less proscriptive and
> doesn't make any demands on any other libs.
> > >
> > >Comments/thoughts welcome.
> > Any comments on this RFC before I see about investing further time
> > in it to clean it up a bit and submit as a non-RFC patchset for
> merge in 2.1?
> > >>>I would say there are 2 possible approaches for KNI and ring
> handling:
> > >>>1/ You Bruce, Marc and Keith are advocating for a layer on top of
> > >>>ethdev, ring, KNI and possibly other devices, which uses mbuf. The
> > >>>set of functions is simpler than ethdev but the data structure is
> > >>>mbuf which is related to ethdev layer.
> > >>>2/ Konstantin and Neil talked about keeping mbuf for ethdev layer
> > >>>and related libs only. Ring and KNI could have an ethdev API with a
> > >>>reduced set of implemented functions. Crypto devices could adopt a
> > >>>specific crypto API and an ethdev API at the same time.
> > >>I don't fully understand which APIs you meant by non-ethdev. This
> > >>pktdev wrapper proposal abstracts RX and TX functions only, and all
> > >>of these are using mbufs as the packet buffer abstraction right now
> anyway (ethdev).
> > >>
> > >He's referring to future device classes (like crypto devices), which
> > >ostensibly would make use of the pktdev API.  My argument (and I
> > >think Thomas') is that if a bit of hardware can be made to operate as
> > >a packet sending/receiving device, then its just as reasonable to use
> > >the existing ethdev api rather than some other restricted version of
> > >it (pktdev)
> > >
> > >>This approach does not preclude that different libraries expose
> > >>other API calls. In fact they will have to; setup the port/device
> > >>... It is just a higher level API, so that you don't have to check
> > >>the type of port in your DPDK application I/O loop, minimizing user's
> code.
> > >>
> > >No argument there.  But if thats the case (and I agree that it is),
> > >an application will implicitly have to know what what type of device
> > >it is, because it (the application) will need to understand the
> specific API it is writing to.
> > >
> > >>Or were you in 2) thinking about creating a different "packet buffer"
> > >>abstraction, independent from the ethdev, and then map the different
> > >>port specifics (e.g. mbuf) to this new abstraction?
> > >>
> > >My argument was to just leave the ethdev api alone.  If a device
> > >class can be made to look like a 

[dpdk-dev] [PATCH v2 06/19] enic: move enic PMD to drivers/net directory

2015-05-20 Thread Richardson, Bruce


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, May 20, 2015 4:56 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org; Sujith Sankar
> Subject: Re: [dpdk-dev] [PATCH v2 06/19] enic: move enic PMD to
> drivers/net directory
> 
> 2015-05-15 16:56, Bruce Richardson:
> > move enic PMD to drivers/net directory
> >
> > Signed-off-by: Bruce Richardson 
> > ---
> >  drivers/net/Makefile |2 +-
> >  drivers/net/enic/LICENSE |   27 +
> >  drivers/net/enic/Makefile|   71 ++
> >  drivers/net/enic/enic.h  |  200 +
> >  drivers/net/enic/enic_clsf.c |  259 ++
> >  drivers/net/enic/enic_compat.h   |  147 
> >  drivers/net/enic/enic_ethdev.c   |  640 +++
> >  drivers/net/enic/enic_main.c | 1117
> ++
> >  drivers/net/enic/enic_res.c  |  219 +
> >  drivers/net/enic/enic_res.h  |  168 
> >  drivers/net/enic/rte_pmd_enic_version.map|4 +
> >  drivers/net/enic/vnic/cq_desc.h  |  126 +++
> >  drivers/net/enic/vnic/cq_enet_desc.h |  261 ++
> >  drivers/net/enic/vnic/rq_enet_desc.h |   76 ++
> >  drivers/net/enic/vnic/vnic_cq.c  |  117 +++
> >  drivers/net/enic/vnic/vnic_cq.h  |  151 
> >  drivers/net/enic/vnic/vnic_dev.c | 1054
> 
> >  drivers/net/enic/vnic/vnic_dev.h |  212 +
> >  drivers/net/enic/vnic/vnic_devcmd.h  |  774 ++
> >  drivers/net/enic/vnic/vnic_enet.h|   78 ++
> >  drivers/net/enic/vnic/vnic_intr.c|   78 ++
> >  drivers/net/enic/vnic/vnic_intr.h|  126 +++
> >  drivers/net/enic/vnic/vnic_nic.h |   88 ++
> >  drivers/net/enic/vnic/vnic_resource.h|   97 +++
> >  drivers/net/enic/vnic/vnic_rq.c  |  245 ++
> >  drivers/net/enic/vnic/vnic_rq.h  |  282 +++
> >  drivers/net/enic/vnic/vnic_rss.c |   85 ++
> >  drivers/net/enic/vnic/vnic_rss.h |   61 ++
> >  drivers/net/enic/vnic/vnic_stats.h   |   86 ++
> >  drivers/net/enic/vnic/vnic_wq.c  |  245 ++
> >  drivers/net/enic/vnic/vnic_wq.h  |  283 +++
> >  drivers/net/enic/vnic/wq_enet_desc.h |  114 +++
> 
> I think that vnic/ should be renamed to base/

Yes, I was wondering about that. However, this wasn't a driver I was familiar 
with and its origins, so I left it as-is.


[dpdk-dev] [PATCH v2 02/19] drivers: create drivers and drivers/net directory

2015-05-20 Thread Richardson, Bruce
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, May 20, 2015 4:05 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 02/19] drivers: create drivers and
> drivers/net directory
> 
> 2015-05-15 16:56, Bruce Richardson:
> > Add a new top-level "drivers" directory to which all PMDs will be
> > moved for easier maintenance of both lib folder and drivers
> > themselves. This new directory is a dependency of all the apps in the
> > app folder, so the makefiles for each app are updated.
> 
> Why drivers are a dependency of every apps?
> Some of them don't even include rte_ethdev.h.

Good point. I just got a broken build initially in the apps with missing 
dependencies, so I made the change so that dependencies were the same as before.

Looking at things, I imagine test-pmd and test-pipeline depend upon the drivers 
since they do packet IO, and the test app will depend on it as it has unit 
tests for the bond-ethdev driver.
Dump-cfg and the test-acl app should not have a dependency on the driver, I 
imagine. 

It appears that the cmdline-test app is missing a DEPDIRS-y line, but it 
probably should depend on the libs, right?

/Bruce


[dpdk-dev] [PATCH] mbuf: add comment explaining confusing code

2015-03-27 Thread Richardson, Bruce


> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Friday, March 27, 2015 4:44 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mbuf: add comment explaining confusing
> code
> 
> On Fri, Mar 27, 2015 at 02:55:27PM +, Bruce Richardson wrote:
> > On Fri, Mar 27, 2015 at 10:38:41AM -0400, Neil Horman wrote:
> > > On Fri, Mar 27, 2015 at 02:30:50PM +, Bruce Richardson wrote:
> > > > On Fri, Mar 27, 2015 at 10:07:35AM -0400, Neil Horman wrote:
> > > > > On Fri, Mar 27, 2015 at 11:32:38AM +, Bruce Richardson wrote:
> > > > > > On Fri, Mar 27, 2015 at 06:29:56AM -0400, Neil Horman wrote:
> > > > > > > On Thu, Mar 26, 2015 at 09:14:54PM +, Bruce Richardson
> wrote:
> > > > > > > > The logic used in the condition check before freeing an
> > > > > > > > mbuf is sometimes confusing, so explain it in a proper
> comment.
> > > > > > > >
> > > > > > > > Signed-off-by: Bruce Richardson
> > > > > > > > 
> > > > > > > > ---
> > > > > > > >  lib/librte_mbuf/rte_mbuf.h | 10 ++
> > > > > > > >  1 file changed, 10 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > b/lib/librte_mbuf/rte_mbuf.h index 17ba791..0265172 100644
> > > > > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > @@ -764,6 +764,16 @@ __rte_pktmbuf_prefree_seg(struct
> > > > > > > > rte_mbuf *m)  {
> > > > > > > > __rte_mbuf_sanity_check(m, 0);
> > > > > > > >
> > > > > > > > +   /*
> > > > > > > > +* Check to see if this is the last reference to the
> mbuf.
> > > > > > > > +* Note: the double check here is deliberate. If the
> ref_cnt is "atomic"
> > > > > > > > +* the call to "refcnt_update" is a very expensive
> operation, so we
> > > > > > > > +* don't want to call it in the case where we know we
> are the holder
> > > > > > > > +* of the last reference to this mbuf i.e. ref_cnt == 1.
> > > > > > > > +* If however, ref_cnt != 1, it's still possible that we
> may still be
> > > > > > > > +* the final decrementer of the count, so we need to
> check that
> > > > > > > > +* result also, to make sure the mbuf is freed properly.
> > > > > > > > +*/
> > > > > > > > if (likely (rte_mbuf_refcnt_read(m) == 1) ||
> > > > > > > > likely (rte_mbuf_refcnt_update(m, -1) 
> > > > > > > > == 0))
> {
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.1.0
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > NAK
> > > > > > >  the comment is incorrect, a return code of 1 from
> > > > > > > rte_mbuf_refcnt_read doesn't guarantee you are the last
> > > > > > > holder of the buffer if two contexts have a pointer to it.
> > > > > > If two threads have pointers to it, and are both going to free
> > > > > > it, the refcnt must be 2 not one, otherwise the refcnt is
> meaningless.
> > > > > >
> > > > >
> > > > > What about the other concrete case that I illustrated, where one
> > > > > context is attempting to increment the refcount, while the other
> > > > > is decrementing it with the intention to free?  By making the
> > > > > read and set operation disctinct here you've broken the
> > > > > atomicity of the read and update logic that atomics are there for
> and created a race condition.  I don't know how else to explain this to
> you.
> > > > > if(atomic_read == 1) then atomic_set(0), breaks the entire
> > > > > notion of what atomics are meant to do (namely update and read
> > > > > state as an atomic unit), you just can't get away with not
> > > > > having that atom

[dpdk-dev] Testpmd returns error.

2015-02-22 Thread Richardson, Bruce




On 22 Feb 2015, at 16:19, David Marchand mailto:david.marchand at 6wind.com>> wrote:

Hello,

On Sun, Feb 22, 2015 at 3:17 PM, Thomas Monjalon mailto:thomas.monjalon at 6wind.com>> wrote:
Hi Tetsuya,

> Someone, could you please check it?

It is possible that this patchset was not correctly tested.
We might revert it or try to fix it.
I think the decision should be done by its authors (Danny, Bruce),
or the Linux EAL maintainer (David).

Well, I need to have a deeper look at this change.
I did not find time before my holidays.

- I think there may be an issue with the use of resource0 instead of /dev/uio.
I am not sure uio mmap will be happy or I overlooked something trivial.

- Testing proc type in pci_uio_map_resource() looks wrong to me, since we 
validated earlier in this same function that we are in primary process.

- uio_res->maps indexes are not the same as the pci resources, might trigger 
problems (and it clearly does not make it easy to read ...).
And now we are reading sysfs twice.
I would prefer this code is reworked so that we avoid those loops in 
eal_pci_uio.c.


Anyway, I am still on holiday (I should have kept my laptop away ...), I will 
be back tomorrow.
I suppose Bruce or Danny will come with a fix, let's decide what the best 
solution is at this moment.



I'll take another look at this but unfortunately I don't think I can 
necessarily reproduce this issue as it seems to occur with enic driver, but I 
will try.

--
David Marchand


[dpdk-dev] [PATCH] lpm: fix overflow issue

2015-02-22 Thread Richardson, Bruce
Sorry I missed this Friday. I'll look at it  shortly.



On 21 Feb 2015, at 22:56, Igor Ryzhov mailto:iryzhov at 
nfware.com>> wrote:

Hello again. Will anybody review this patch?
This is really critical issue, because it can lead to memory corruption and 
break any program using LPM.

CCing this to Bruce Richardson, because he is maintainer of LPM.

Regards,
Igor Ryzhov

On Fri, Feb 20, 2015 at 4:16 PM, Igor Ryzhov mailto:iryzhov at nfware.com>> wrote:
LPM table overflow may occur if table is full and added rule has the biggest 
depth that already have some rules.

Signed-off-by: Igor Ryzhov mailto:iryzhov at nfware.com>>
---
 lib/librte_lpm/rte_lpm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 983e04b..cc51210 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -298,6 +298,9 @@ rule_add(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t 
depth,
return rule_index;
}
}
+
+   if (rule_index == lpm->max_rules)
+   return -ENOSPC;
} else {
/* Calculate the position in which the rule will be stored. */
rule_index = 0;
--
1.9.3 (Apple Git-50)




--
Regards,
Igor Ryzhov


[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2015-01-08 Thread Richardson, Bruce
My opinion on this is that the lcore_id is rarely (if ever) used to find the 
actual core a thread is being run on. Instead it is used 99% of the time as a 
unique array index per thread, and therefore that we can keep that usage by 
just assigning a valid lcore_id to any extra threads created. The suggestion to 
get/set affinities on top of that seems a good one to me also.

/Bruce

-Original Message-
From: Ananyev, Konstantin 
Sent: Thursday, January 8, 2015 5:06 PM
To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce
Cc: dev at dpdk.org
Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore


Hi Steve,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming
> Sent: Tuesday, December 23, 2014 9:52 AM
> To: Stephen Hemminger; Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per 
> lcore
> 
> 
> 
> > -Original Message-
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > Sent: Tuesday, December 23, 2014 2:29 AM
> > To: Richardson, Bruce
> > Cc: Liang, Cunming; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per 
> > lcore
> >
> > On Mon, 22 Dec 2014 09:46:03 +
> > Bruce Richardson  wrote:
> >
> > > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> > > > ...
> > > > > I'm conflicted on this one. However, I think far more 
> > > > > applications would be broken to start having to use thread_id 
> > > > > in place of an lcore_id than would be
> > broken
> > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > I'm actually struggling to come up with a large number of 
> > > > > scenarios where
> > it's
> > > > > important to an app to determine the cpu it's running on, 
> > > > > compared to the
> > large
> > > > > number of cases where you need to have a data-structure per 
> > > > > thread. In
> > DPDK
> > > > > libs
> > > > > alone, you see this assumption that lcore_id == thread_id a 
> > > > > large number
> > of
> > > > > times.
> > > > >
> > > > > Despite the slight logical inconsistency, I think it's better 
> > > > > to avoid
> > introducing
> > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > >
> > > > > /Bruce
> > > >
> > > > Ok, I understand it.
> > > > I list the implicit meaning if using lcore_id representing the unique 
> > > > thread.
> > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents 
> > > > the logical
> > core id.
> > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents 
> > > > an unique
> > id for thread.
> > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest 
> > > > to be used only
> > in CASE 1)
> > > > 4). rte_lcore_id() can be used in CASE 2), but the return value 
> > > > no matter
> > represent a logical core id.
> > > >
> > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 
> > > > base on this
> > conclusion.
> > > >
> > > > /Cunming
> > >
> > > Sorry, I don't like that suggestion either, as having lcore_id 
> > > values greater than RTE_MAX_LCORE is terrible, as how will people 
> > > know how to dimension
> > arrays
> > > to be indexes by lcore id? Given the choice, if we are not going 
> > > to just use lcore_id as a generic thread id, which is always 
> > > between 0 and
> > RTE_MAX_LCORE
> > > we can look to define a new thread_id variable to hold that. 
> > > However, it should have a bounded range.
> > > From an ease-of-porting perspective, I still think that the 
> > > simplest option is to use the existing lcore_id and accept the 
> > > fact that it's now a thread id rather than an actual physical 
> > > lcore. Question is, is would that cause us lots of issues in the future?
> > >
> > > /Bruce
> >
> > The current rte_lcore_id() has different meaning the thread. Your 
> > proposal will break code that uses lcore_id to do per-cpu statistics 
> > and the lcore_config code in the samples.
> > q
> [Liang, Cunming] +1.

Few more thoughts on that subject:

Actually one more place i

[dpdk-dev] Cannot mmap device resource in DPDK 1.7.0 multi-process/multi-thread

2014-10-28 Thread Richardson, Bruce
> -Original Message-
> From: Mario Gianni [mailto:m.gianni at engineer.com]
> Sent: Friday, October 24, 2014 4:03 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Cannot mmap device resource in DPDK 1.7.0 multi-
> process/multi-thread
> 
> So you are telling me that in order to implement multi-process I should better
> use the l2fwd_fork example instead of client_server_mp.
> In fact if I use the client_server_mp with a lot of mp_client threads it 
> gives me
> the error.
> If instead I use the l2fwd_fork example it doesn't give me the error.
> 
> One more question at this point:
> Assume that I use l2fwd_fork, when I launch the secondary process, how do I
> assign the lcore coremask associated with that process?
> 

With the l2fwd_fork example, the forked processes are not secondary proceses in 
the sense of the other DPDK multiprocess models - with each one having its own 
core mask and set of threads. Instead, each forked process is a single thread 
based off the core-mask provided at process startup. It's a different way of 
doing things, but one which provides separate processes, while guaranteeing 
that all processes can share the same hugepage memory at the same address 
without relying on having the kernel map things at specific points in the 
address space.

/Bruce


[dpdk-dev] [PATCH v2 04/13] ethdev: support of multiple sizes of redirection table

2014-10-28 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Tuesday, October 28, 2014 10:10 AM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 04/13] ethdev: support of multiple sizes of
> redirection table
> 
> 2014-10-28 00:33, Zhang, Helin:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > 2014-09-25 16:40, Helin Zhang:
> > > >  /* Definitions used for redirection table entry size */
> > > > -#define ETH_RSS_RETA_NUM_ENTRIES 128
> > > > -#define ETH_RSS_RETA_MAX_QUEUE   16
> > > > +#define ETH_RSS_RETA_SIZE_64  64
> > > > +#define ETH_RSS_RETA_SIZE_128 128
> > > > +#define ETH_RSS_RETA_SIZE_512 512
> > > > +
> > > > +#define RTE_BIT_WIDTH_64 (CHAR_BIT * sizeof(uint64_t))
> > >
> > > Are these constants really needed?
> >
> > These constants were defined for the third input parameter of
> > rte_eth_dev_rss_reta_update() and rte_eth_dev_rss_reta_query(). End users
> need
> > to give the correct reta size listed as above, as other values is not 
> > valid. So it
> would be
> > better to list the valid reta sizes in macros here.
> 
If only limited range of values are allowed, would an enum work better than a 
set of macros? 

> OK, so you should explain that only these values are allowed.
> In general, it's something we explain in the comment of the function.
> 
> By the way, why only these values are allowed?


[dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx cycles/packet

2014-10-22 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Wednesday, October 22, 2014 3:53 PM
> To: Neil Horman; Liang, Cunming
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx
> cycles/packet
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> > Sent: Wednesday, October 22, 2014 3:03 PM
> > To: Liang, Cunming
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx
> cycles/packet
> >
> > On Tue, Oct 21, 2014 at 01:17:01PM +, Liang, Cunming wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > Sent: Tuesday, October 21, 2014 6:33 PM
> > > > To: Liang, Cunming
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx
> > > > cycles/packet
> > > >
> > > > On Sun, Oct 12, 2014 at 11:10:39AM +, Liang, Cunming wrote:
> > > > > Hi Neil,
> > > > >
> > > > > Very appreciate your comments.
> > > > > I add inline reply, will send v3 asap when we get alignment.
> > > > >
> > > > > BRs,
> > > > > Liang Cunming
> > > > >
> > > > > > -Original Message-
> > > > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > > > Sent: Saturday, October 11, 2014 1:52 AM
> > > > > > To: Liang, Cunming
> > > > > > Cc: dev at dpdk.org
> > > > > > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx 
> > > > > > and tx
> > > > cycles/packet
> > > > > >
<...snip...>
> > > > > >
> > > > > > > + printf("Force Stop!\n");
> > > > > > > + stop = 1;
> > > > > > > + }
> > > > > > > + if (signum == SIGUSR2)
> > > > > > > + stats_display(0);
> > > > > > > +}
> > > > > > > +/* main processing loop */
> > > > > > > +static int
> > > > > > > +main_loop(__rte_unused void *args)
> > > > > > > +{
> > > > > > > +#define PACKET_SIZE 64
> > > > > > > +#define FRAME_GAP 12
> > > > > > > +#define MAC_PREAMBLE 8
> > > > > > > + struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> > > > > > > + unsigned lcore_id;
> > > > > > > + unsigned i, portid, nb_rx = 0, nb_tx = 0;
> > > > > > > + struct lcore_conf *conf;
> > > > > > > + uint64_t prev_tsc, cur_tsc;
> > > > > > > + int pkt_per_port;
> > > > > > > + uint64_t packets_per_second, total_packets;
> > > > > > > +
> > > > > > > + lcore_id = rte_lcore_id();
> > > > > > > + conf = _conf[lcore_id];
> > > > > > > + if (conf->status != LCORE_USED)
> > > > > > > + return 0;
> > > > > > > +
> > > > > > > + pkt_per_port =  MAX_TRAFIC_BURST / conf->nb_ports;
> > > > > > > +
> > > > > > > + int idx = 0;
> > > > > > > + for (i = 0; i < conf->nb_ports; i++) {
> > > > > > > + int num = pkt_per_port;
> > > > > > > + portid = conf->portlist[i];
> > > > > > > + printf("inject %d packet to port %d\n", num, portid);
> > > > > > > + while (num) {
> > > > > > > + nb_tx = RTE_MIN(MAX_PKT_BURST, num);
> > > > > > > + nb_tx = rte_eth_tx_burst(portid, 0,
> > > > > > > + _burst[idx], nb_tx);
> > > > > > > + num -= nb_tx;
> > > > > > > + idx += nb_tx;
> > > > > > > + }
> > > > > > > + }
> > > > > > > + printf("Total packets inject to prime ports = %u\n", idx);
> > > > > > > +
> > > > > > > + packets_per_second = (link_mbps * 1000 * 1000) /
> > > > > > > + +((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) *
> CHAR_BIT);
> > > > > > > + printf("Each port will do %"PRIu64" packets per second\n",
> > > > > > > + +packets_per_second);
> > > > > > > +
> > > > > > > + total_packets = RTE_TEST_DURATION * conf->nb_ports *
> > > > > > packets_per_second;
> > > > > > > + printf("Test will stop after at least %"PRIu64" packets
> received\n",
> > > > > > > + + total_packets);
> > > > > > > +
> > > > > > > + prev_tsc = rte_rdtsc();
> > > > > > > +
> > > > > > > + while (likely(!stop)) {
> > > > > > > + for (i = 0; i < conf->nb_ports; i++) {
> > > > > > > + portid = conf->portlist[i];
> > > > > > > + nb_rx = rte_eth_rx_burst((uint8_t) portid, 0,
> > > > > > > +  pkts_burst,
> MAX_PKT_BURST);
> > > > > > > + if (unlikely(nb_rx == 0)) {
> > > > > > > + idle++;
> > > > > > > + continue;
> > > > > > > + }
> > > > > > > +
> > > > > > > + count += nb_rx;
> > > > > > Doesn't take into consideration error conditions.  rte_eth_rx_burst 
> > > > > > can
> > > > return
> > > > > > -ENOTSUP
> > > > > [Liang, Cunming] It returns -ENOTSUP when turning on ETHDEV_DEBUG
> > > > CONFIG.
> > > > > The error is used to avoid no function call register.
> > > > > When ETHDEV_DEBUG turn off, the NULL function call cause segfault
> directly.
> > > > > So 

[dpdk-dev] [PATCH] ixgbe: Fix clang compilation issue

2014-10-22 Thread Richardson, Bruce
Self-nak, resent old patch.

> -Original Message-
> From: Richardson, Bruce
> Sent: Wednesday, October 22, 2014 11:54 AM
> To: dev at dpdk.org
> Cc: Richardson, Bruce
> Subject: [PATCH] ixgbe: Fix clang compilation issue
> 
> Issue reported by Keith Wiles.
> Clang fails with an error about a variable being used uninitialized:
> 
>   CC ixgbe_rxtx_vec.o
> /home/keithw/projects/dpdk-code/org-
> dpdk/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c:67:30:
> error: variable 'dma_addr0' is uninitialized
>   when used here [-Werror,-Wuninitialized]
> dma_addr0 = _mm_xor_si128(dma_addr0, dma_addr0);
>   ^
> 
> This error can be fixed by replacing the call to xor which
> takes two parameters, by a call to setzero, which does not take any.
> 
> Signed-off-by: Bruce Richardson 
> ---
>  lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> index 457f267..2236250 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> @@ -64,7 +64,7 @@ ixgbe_rxq_rearm(struct igb_rx_queue *rxq)
>RTE_IXGBE_RXQ_REARM_THRESH) < 0) {
>   if (rxq->rxrearm_nb + RTE_IXGBE_RXQ_REARM_THRESH >=
>   rxq->nb_rx_desc) {
> - dma_addr0 = _mm_xor_si128(dma_addr0, dma_addr0);
> + dma_addr0 = _mm_setzero_si128();
>   for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) {
>   rxep[i].mbuf = >fake_mbuf;
>   _mm_store_si128((__m128i *)[i].read,
> --
> 1.9.3



[dpdk-dev] [PATCH] KNI: fix compilation warning 'missing-field-initializers'

2014-10-22 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Marc Sune
> Sent: Wednesday, October 22, 2014 10:50 AM
> To: Thomas Monjalon
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] KNI: fix compilation warning 'missing-field-
> initializers'
> 
> On 22/10/14 10:50, Thomas Monjalon wrote:
> > 2014-10-22 10:42, Marc Sune:
> >> The mutex needs to be initialized to RTE_SPINLOCK_INITIALIZER(0) too, or
> >> move the initialization of the mutex to rte_kni_init().
> > RTE_SPINLOCK_INITIALIZER is { 0 }
> > By initializing one field, all other fields are set to 0, so spinlock also.
> > Just choose one field and it's OK.
> > It should be tested with ICC also but I think it's OK.
> 
> Seems that you are right, at least for C99:
> 
> C99 Standard 6.7.8.21
> 
>  If there are fewer initializers in a brace-enclosed list than
> there are elements or members of an aggregate, or fewer characters
> in a string literal used to initialize an array of known size than
> there are elements in the array, the remainder of the aggregate
> shall be initialized implicitly the same as objects that have static
> storage duration.
> 
> 
> I am not sure if there can be problems with other C dialects (e.g. C11),
> I don't have the std here. So to prevent any problem with them (could
> produce a dead-lock during first rte_kni_alloc() that could be difficult
> to troubleshoot), I would still explicitly initialize the mutex, in one
> or the other way.
> 
> Just tell me if you agree and which one you prefer.
> 
> I don't have an ICC license. I am always trying it with GCC and clang.
> 
> Marc

ICC should be fine with this, it handles just initializing a single member of a 
structure as described by Thomas above.

/Bruce


[dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx cycles/packet

2014-10-21 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> Sent: Tuesday, October 21, 2014 11:33 AM
> To: Liang, Cunming
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx
> cycles/packet
> 
> > >
> > > > +   if (count == 0)
> > > > +   return -1;
> > > > +
> > > > +   printf("%lu packet, %lu drop, %lu idle\n", count, drop, idle);
> > > > +   printf("Result: %ld cycles per packet\n", (cur_tsc - prev_tsc) 
> > > > / count);
> > > > +
> > > Bad math here.  Theres no guarantee that the tsc hasn't wrapped 
> > > (potentially
> > > more than once) depending on your test length.  you need to check the tsc
> before
> > > and after each burst and record an average of deltas instead, accounting 
> > > in
> each
> > > instance for the possibility of wrap.
> > [Liang, Cunming] I'm not sure catch your point correctly.
> > I think both cur_tsc and prev_tsc are 64 bits width.
> > For 3GHz, I think it won't wrapped so quick.
> > As it's uint64_t, so even get wrapped, the delta should still be correct.
> But theres no guarantee that the tsc starts at zero when you begin your test.
> The system may have been up for a long time and near wrapping already.
> Regardless, you need to account for the possibility that cur_tsc is smaller
> than prev_tsc, or this breaks.
> 

The tsc. is 64-bit and so only wraps around every couple of hundred years or so 
on a 2GHz machine, so I don't think it's necessary to handle that case. 

/Bruce


[dpdk-dev] development/integration branch?

2014-10-21 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Marc Sune
> Sent: Tuesday, October 21, 2014 9:23 AM
> To: 
> Subject: [dpdk-dev] development/integration branch?
> 
> Good morning,
> 
> Some DPDK users, including myself, use a clone of the git repository to
> compile DPDK for their applications, instead of downloading the tarball
> of each release.
> 
> In my opinion, it would be useful for such users that the master branch
> contains only stable releases, to prevent (mistakenly) to use a wip DPDK
> version, and jump quickly to the latest stable with a simple git pull
> without having to check the tags. Also new users would clone the repo
> and get only the stable release.
> 
> So I would propose to use an integration/development branch, where the
> patches are integrated and only push to master once a stable release is
> tagged in this integration branch.
> 
> Thoughts?
> 
> best
> marc

Ideally, our master branch should always be good and stable, but given reality 
often interferes with such good intentions I think that having dev branches is 
not a bad idea. However, what we may lose by doing so is having a larger group 
of people constantly using the master branch and reporting problems to us. 

On balance, I'd be slightly in favour of this suggestion.

/Bruce


[dpdk-dev] [PATCH] lib/librte_ether: new filter APIs definition

2014-10-17 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Friday, October 17, 2014 10:08 AM
> To: Wu, Jingjing
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] lib/librte_ether: new filter APIs definition
> 
> 2014-10-17 07:29, Jingjing Wu:
> > Define new APIs to support configure multi-kind filters using same APIs,
> > instead of creating each API set for each kind of filter.
> >  - rte_eth_dev_filter_supported
> >  - rte_eth_dev_filter_ctrl
> >
> > Filter types, operations, and structures are defined specifically
> > in new header file lib/librte_eth/rte_dev_ctrl.h.
> >
> > As to the implementation discussion, please refer to
> > http://dpdk.org/ml/archives/dev/2014-September/005179.html
> [...]
> > --- /dev/null
> > +++ b/lib/librte_ether/rte_eth_ctrl.h
> 
> Why this name? I think we can reserve this file for filtering API.
> So rte_eth_rx_filter.h would be more appropriate.
> 
> > +/**
> > + * All generic operations to filters
> > + */
> 
> rewording: "Generic operations on filters"
> Could you elaborate on "generic"? What would mean "specific"?
> 
> > +enum rte_filter_op {
> > +   RTE_ETH_FILTER_OP_NONE = 0,
> > +   /**< used to check whether the type filter is supported */
> > +   RTE_ETH_FILTER_OP_ADD,  /**< add filter entry */
> > +   RTE_ETH_FILTER_OP_UPDATE,   /**< update filter entry */
> > +   RTE_ETH_FILTER_OP_DELETE,   /**< delete filter entry */
> > +   RTE_ETH_FILTER_OP_FLUSH,/**< flush all entries */
> > +   RTE_ETH_FILTER_OP_GET,  /**< get filter entry */
> > +   RTE_ETH_FILTER_OP_SET,  /**< configurations */
> > +   RTE_ETH_FILTER_OP_GET_INFO,
> 
> Could we remove "OP", except for OP_NONE and OP_MAX?

OP_NONE ==> NOP or NOOP perhaps?

/Bruce


[dpdk-dev] [PATCH 02/12] Add atomic operations for IBM Power architecture

2014-10-16 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Chao CH Zhu
> Sent: Thursday, October 16, 2014 4:14 AM
> To: Ananyev, Konstantin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 02/12] Add atomic operations for IBM Power
> architecture
> 
> Konstantin,
> 
> In my understanding, compiler barrier is a kind of software barrier which
> prevents the compiler from moving memory accesses across the barrier. This
> should be architecture-independent. And the "sync" instruction is a
> hardware barrier which depends on PowerPC architecture. So I think the
> compiler barrier should be the same on x86 and PowerPC. Any comments?
> Please correct me if I was wrong.
> 
I would agree with that assessment, as far as it goes, in that a compiler 
barrier is going to be the same on both architectures. However, we also need to 
start thinking about actual use cases - how to we specify the barriers in a 
piece of code where we need a full memory barrier on PPC and only a compiler 
barrier on IA? 
My suggestion would be to do first as you propose and have proper primitives 
for the different barrier types defined correctly for each platform - with the 
compiler barrier being, presumably, common across each one. Then, as a second 
step, we probably need to look at defining "logical" barrier types (for want of 
a better term) that can then be used in the code and which would be different 
across platforms.

Does this make sense to do this way? Is it the best solution? Do we want to 
define the basic primitives or are we only ever likely to need the logical 
barrier types?

/Bruce


[dpdk-dev] kernel panic when stop my test demo

2014-10-16 Thread Richardson, Bruce


> -Original Message-
> From: Lilijun [mailto:jerry.lilijun at huawei.com]
> Sent: Thursday, October 16, 2014 3:40 AM
> To: Richardson, Bruce; dev at dpdk.org; stephen at networkplumber.org
> Subject: Re: [dpdk-dev] kernel panic when stop my test demo
> 
> On 2014/10/15 18:08, Richardson, Bruce wrote:
> >
> >
> >> -Original Message-
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Lilijun
> >> Sent: Wednesday, October 15, 2014 10:43 AM
> >> To: dev at dpdk.org; stephen at networkplumber.org
> >> Subject: Re: [dpdk-dev] kernel panic when stop my test demo
> >>
> >> Hi all,
> >>
> >> After adding unmap uio resources operations in process signal handler
> functions,
> >> An new error was found as follows:
> >> Call Trace:
> >>  [] uio_release+0x40/0x60 [uio]
> >>  [] __fput+0xe9/0x270
> >>  [] fput+0xe/0x10
> >>  [] task_work_run+0xa7/0xe0
> >>  [] do_notify_resume+0x97/0xb0
> >>  [] int_signal+0x12/0x17
> >>
> >> The code for unmap uio resources is shown:
> >> static void pci_dev_uio_unmap(struct rte_pci_device *pci_dev, uint8_t
> port_id)
> >> {
> >> int i;
> >>
> >> RTE_LOG(INFO, EAL, "begin unmap port %d uio resource! \n", 
> >> port_id);
> >> if (NULL == pci_dev)
> >> {
> >> RTE_LOG(ERR, EAL, "begin unmap port %d uio resource! \n",
> port_id);
> >> return;
> >> }
> >>
> >> for (i = 0; i != PCI_MAX_RESOURCE; i++)
> >> {
> >> /* skip empty BAR */
> >> if (0 == pci_dev->mem_resource[i].phys_addr)
> >> continue;
> >> if (munmap(pci_dev->mem_resource[i].addr, pci_dev-
> >>> mem_resource[i].len)
> >> == 
> >> -1){
> >> RTE_LOG(ERR, EAL, "Error with munmap\n");
> >> return;
> >> }
> >> }
> >> if (close(pci_dev->intr_handle.fd) == -1){
> >> RTE_LOG(ERR, EAL, "Error closing interrupt handle\n");
> >> return;
> >> }
> >> pci_dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> >> RTE_LOG(INFO, EAL, "unmap port %d uio resource successfully!\n",
> >> port_id);
> >> }
> >>
> >> Does anyone has some ideas?
> >>
> >> Thanks for any help.
> >> Jerry
> >>
> >> On 2014/10/14 19:58, Lilijun wrote:
> >>> Hi Stephen and all,
> >>>
> >>> I have a same problem as this older email describes on Aug 14, 2013.
> >>> Any help will be appreciated.
> >>>
> >>> The details is shown as follows.
> >>> The key step implementation of my demo is:
> >>> 1. Firstly, call rte_eal_init() to do some initialization.
> >>> 2. Switch the driver of my Intel  82599 NIC from ixgbe.ko to igb_uio.ko
> >>> like tools/dpdk_nic_bind.py written in C source code.
> >>> 3. Configure rte_dev and start it.
> >>> 4. Do some rx/tx tests.
> >>> 5. call rte_eth_dev_stop(dpdk_port_id) to stop the hardware as your 
> >>> history
> >> emails.
> >>> 6. Switch the driver of the NIC from igb_uio.ko to ixgbe.ko.
> >>> 7. Kill the demo using commands: kill -9.
> >
> > Just to clarify one point - you have an application running which was using 
> > the
> NICs with DPDK while you remove the uio driver and replace it with ixgbe?
> I would expect doing such a thing to cause problems as stopping the device 
> does
> not cause the NIC BAR memory to be unmapped from the DPDK process.
> Therefore removing the driver providing that memory map and getting another
> driver to start using those same BARs would not be recommended.
> >
> 
> Thanks for your reply.
> Yes, I want to change the NIC driver by replacing the uio driver with ixgbe in
> order to recover the NIC to origin kernel ether-net devices while keeping the
> application running.
> Then my application can use the NICs with DPDK or with kernel ixgbe driver on
> demand.
> I am confusing with how to release all uio resources when stop my application.
> 
> Would you like to give me any suggestions for my requirements?
> 

Right now, there is no way to do this without changing the internals of the 
DPDK itself. The BARs from the NIC are mmapped permanently into the processes 
address space on initialization of the application, and are never released. 
You'd basically need to write code to un-initialize the DPDK and then 
reinitialize it at a later point.
Might an alternative be to actually have two separate applications or binaries 
that appear as one, or work as one? Then you could shut down the dpdk binary 
before removing the uio driver, and switch over to the ixgbe driver and use the 
other application. However, I realise that getting a seamless transition could 
be difficult there.

/Bruce


[dpdk-dev] [PATCH v2] Adding the routines rte_pktmbuf_alloc_bulk() and rte_pktmbuf_free_bulk()

2014-10-07 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Keith Wiles
> Sent: Monday, October 06, 2014 9:26 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2] Adding the routines rte_pktmbuf_alloc_bulk()
> and rte_pktmbuf_free_bulk()
> 
> Minor helper routines to mirror the mempool routines and remove the code
> from applications. The ixgbe_rxtx_vec.c routine could be changed to use
> the ret_pktmbuf_alloc_bulk() routine inplace of rte_mempool_get_bulk().
> 

Can we maybe remove the reference to the vector driver for this patch. I don't 
believe changes there are what we want to do.
/Bruce

> Combined __rte_mbuf_raw_alloc_bulk into the rte_pktmbuf_alloc_bulk
> function.
> Fixed up the comments to state the correct return values.
> 
> Signed-off-by: Keith Wiles 
> ---
>  lib/librte_mbuf/rte_mbuf.h | 55
> ++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 1c6e115..337611b 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -671,6 +671,44 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>  }
> 
>  /**
> + * Allocate a list of mbufs from a mempool into a mbuf array.
> + *
> + * This mbuf list contains one segment per mbuf, which has a length of 0. The
> pointer
> + * to data is initialized to have some bytes of headroom in the buffer
> + * (if buffer size allows).
> + *
> + * The routine is just a simple wrapper routine to reduce code in the 
> application
> and
> + * provide a cleaner API for multiple mbuf requests.
> + *
> + * @param mp
> + *   The mempool from which the mbuf is allocated.
> + * @param m_list
> + *   An array of mbuf pointers, cnt must be less then or equal to the size 
> of the
> array.
> + * @param cnt
> + *   Number of slots in the m_list array to fill.
> + * @return
> + *   - cnt is returned if a successful allocation.
> + *   - <0 negative number is an error code.
> + */
> +static inline int __attribute__((always_inline))
> +rte_pktmbuf_alloc_bulk(struct rte_mempool *mp, struct rte_mbuf *m_list[],
> int16_t cnt)
> +{
> + int ret;
> +
> + ret = rte_mempool_get_bulk(mp, (void **)m_list, cnt);
> + if ( ret == 0 ) {
> + ret = cnt;
> + while(cnt--) {
> +#ifdef RTE_MBUF_REFCNT
> + rte_mbuf_refcnt_set(*m_list, 1);
> +#endif /* RTE_MBUF_REFCNT */
> + rte_pktmbuf_reset(*m_list++);
> + }
> + }
> + return ret;
> +}
> +
> +/**
>   * Free a segment of a packet mbuf into its original mempool.
>   *
>   * Free an mbuf, without parsing other segments in case of chained
> @@ -708,6 +746,23 @@ static inline void rte_pktmbuf_free(struct rte_mbuf
> *m)
>   }
>  }
> 
> +/**
> + * Free a list of packet mbufs back into its original mempool.
> + *
> + * Free a list of mbufs by calling rte_pktmbuf_free() in a loop as a wrapper
> function.
> + *
> + * @param m_list
> + *   An array of rte_mbuf pointers to be freed.
> + * @param npkts
> + *   Number of packets to free in m_list.
> + */
> +static inline void __attribute__((always_inline))
> +rte_pktmbuf_free_bulk(struct rte_mbuf *m_list[], int16_t npkts)
> +{
> + while(npkts--)
> + rte_pktmbuf_free(*m_list++);
> +}
> +
>  #ifdef RTE_MBUF_REFCNT
> 
>  /**
> --
> 2.1.0



[dpdk-dev] [PATCH 2/2] Adding the routines rte_pktmbuf_alloc_bulk() and rte_pktmbuf_free_bulk()

2014-10-06 Thread Richardson, Bruce


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Keith Wiles
> Sent: Sunday, October 05, 2014 12:10 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 2/2] Adding the routines rte_pktmbuf_alloc_bulk()
> and rte_pktmbuf_free_bulk()
> 
> Minor helper routines to mirror the mempool routines and remove the code
> from applications. The ixgbe_rxtx_vec.c routine could be changed to use
> the ret_pktmbuf_alloc_bulk() routine inplace of rte_mempool_get_bulk().
> 

I believe such a change would cause a performance regression, as the extra init 
code in the alloc_bulk() function would take additional cycles and is not 
needed. The vector routines use the mempool function directly, so that there is 
no overhead of mbuf initialization, as the vector routines use their additional 
"knowledge" of what the mbufs will be used for to init them in a faster manner 
than can be done inside the mbuf library.

/Bruce

> Signed-off-by: Keith Wiles 
> ---
>  lib/librte_mbuf/rte_mbuf.h | 77
> ++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 1c6e115..f298621 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -546,6 +546,41 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf
> *m)
>  }
> 
>  /**
> + * @internal Allocate a list of mbufs from mempool *mp*.
> + * The use of that function is reserved for RTE internal needs.
> + * Please use rte_pktmbuf_alloc_bulk().
> + *
> + * @param mp
> + *   The mempool from which mbuf is allocated.
> + * @param m_list
> + *   The array to place the allocated rte_mbufs pointers.
> + * @param cnt
> + *   The number of mbufs to allocate
> + * @return
> + *   - 0 if the number of mbufs allocated was ok
> + *   - <0 is an ERROR.
> + */
> +static inline int __rte_mbuf_raw_alloc_bulk(struct rte_mempool *mp, struct
> rte_mbuf *m_list[], int cnt)
> +{
> + struct rte_mbuf *m;
> + int ret;
> +
> + ret = rte_mempool_get_bulk(mp, (void **)m_list, cnt);
> + if ( ret == 0 ) {
> + int i;
> + for(i = 0; i < cnt; i++) {
> + m = *m_list++;
> +#ifdef RTE_MBUF_REFCNT
> + rte_mbuf_refcnt_set(m, 1);
> +#endif /* RTE_MBUF_REFCNT */
> + rte_pktmbuf_reset(m);
> + }
> + ret = cnt;
> + }
> + return ret;
> +}
> +
> +/**
>   * Allocate a new mbuf from a mempool.
>   *
>   * This new mbuf contains one segment, which has a length of 0. The pointer
> @@ -671,6 +706,32 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>  }
> 
>  /**
> + * Allocate a list of mbufs from a mempool into a mbufs array.
> + *
> + * This mbuf list contains one segment per mbuf, which has a length of 0. The
> pointer
> + * to data is initialized to have some bytes of headroom in the buffer
> + * (if buffer size allows).
> + *
> + * The routine is just a simple wrapper routine to reduce code in the 
> application
> and
> + * provide a cleaner API for multiple mbuf requests.
> + *
> + * @param mp
> + *   The mempool from which the mbuf is allocated.
> + * @param m_list
> + *   An array of mbuf pointers, cnt must be less then or equal to the size 
> of the
> list.
> + * @param cnt
> + *   Number of slots in the m_list array to fill.
> + * @return
> + *   - The number of valid mbufs pointers in the m_list array.
> + *   - Zero if the request cnt could not be allocated.
> + */
> +static inline int __attribute__((always_inline))
> +rte_pktmbuf_alloc_bulk(struct rte_mempool *mp, struct rte_mbuf *m_list[],
> int16_t cnt)
> +{
> + return __rte_mbuf_raw_alloc_bulk(mp, m_list, cnt);
> +}
> +
> +/**
>   * Free a segment of a packet mbuf into its original mempool.
>   *
>   * Free an mbuf, without parsing other segments in case of chained
> @@ -708,6 +769,22 @@ static inline void rte_pktmbuf_free(struct rte_mbuf
> *m)
>   }
>  }
> 
> +/**
> + * Free a list of packet mbufs back into its original mempool.
> + *
> + * Free a list of mbufs by calling rte_pktmbuf_free() in a loop as a wrapper
> function.
> + *
> + * @param m_list
> + *   An array of rte_mbuf pointers to be freed.
> + * @param npkts
> + *   Number of packets to free in list.
> + */
> +static inline void rte_pktmbuf_free_bulk(struct rte_mbuf *m_list[], int16_t
> npkts)
> +{
> + while(npkts--)
> + rte_pktmbuf_free(*m_list++);
> +}
> +
>  #ifdef RTE_MBUF_REFCNT
> 
>  /**
> --
> 2.1.0



[dpdk-dev] [PATCH v2 3/5] testpmd: Change rxfreet default to 32

2014-09-24 Thread Richardson, Bruce
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Tuesday, September 23, 2014 6:03 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 3/5] testpmd: Change rxfreet default to 32
> 
> On Tue, Sep 23, 2014 at 12:08:15PM +0100, Bruce Richardson wrote:
> > To improve performance by using bulk alloc or vectored RX routines, we
> > need to set rx free threshold (rxfreet) value to 32, so make this the
> > testpmd default.
> >
> > Thirty-two is the minimum setting needed to enable either the
> > bulk alloc or vector RX routines inside the ixgbe driver, so it's
> > best made the default for that reason. Please see
> > "check_rx_burst_bulk_alloc_preconditions()" in ixgbe_rxtx.c, and
> > RX function assignment logic in "ixgbe_dev_rx_queue_setup()" in
> > the same file.
> >
> > The difference in IO performance for testpmd when called without any
> > optional parameters, and using 10G NICs using the ixgbe driver, can be
> > significant - approx 25% or more.
> >
> > Updates in V2:
> > * Updated commit message with additional details
> >
> > Signed-off-by: Bruce Richardson 
> > ---
> >  app/test-pmd/testpmd.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> > index 9f6cdc4..f76406f 100644
> > --- a/app/test-pmd/testpmd.c
> > +++ b/app/test-pmd/testpmd.c
> > @@ -225,7 +225,9 @@ struct rte_eth_thresh tx_thresh = {
> >  /*
> >   * Configurable value of RX free threshold.
> >   */
> > -uint16_t rx_free_thresh = 0; /* Immediately free RX descriptors by 
> > default. */
> > +uint16_t rx_free_thresh = 32; /* Refill RX descriptors once every 32 
> > packets,
> > +   This setting is needed for ixgbe to enable bulk alloc or vector
> > +   receive functionality. */
> 
> I thought we were talking about making this a pmd private selectable item, or
> allowing a reserved "let the pmd decide" setting.  Or are we saving that for a
> later time?
> 

Yes, we are looking at that - and hopefully we can also get a patch for that in 
for our next release. However, I've left this patch in just in case that 
doesn't actually happen, as the performance improvements for 10G are just too 
good to leave aside for the sake of a 1-line change. Ideally, I'd like this go 
to in, and then be replaced by a "proper" fix.

/Bruce

> Neil
> 
> >
> >  /*
> >   * Configurable value of RX drop enable.
> > --
> > 1.9.3
> >
> >


[dpdk-dev] compile error with linuxapp-clang target on Fedora 20 with 3.15.10 kernel

2014-09-22 Thread Richardson, Bruce
Hi all,

just looking to see if anyone has any suggestions to help me debug an issue I'm 
seeing here. Basically, the clang target is no longer working for me on Fedora 
20 -due to errors when compiling up the kernel modules. The interesting thing 
is that the gcc target works fine, while the clang target doesn't - despite the 
fact that gcc is used as the compiler for the kernel modules in both builds. 
Something else about the clang target is affecting the kernel compile.

>From my investigation, it looks like the compiler flags used in both cases are 
>different, but I'm not sure why. The output of compiling up the first of the 
>kni files is shown below, first for a regular gcc target and secondly for the 
>clang target. From what I read, some initial support for clang compiler went 
>into the 3.15 kernels - is it possible that clang is being incorrectly 
>detected as the kernel compiler by the kernel build system in the second case?

Regards,
/Bruce

 GCC TARGET COMPILE
== Build lib/librte_eal/linuxapp/kni
make -C /usr/src/kernels/3.15.10-201.fc20.x86_64 \
KBUILD_SRC=/usr/src/kernels/3.15.10-201.fc20.x86_64 \
KBUILD_EXTMOD="/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni"
 -f /usr/src/kernels/3.15.10-201.fc20.x86_64/Makefile \

test -e include/generated/autoconf.h -a -e include/config/auto.conf || (
\
echo >&2;   \
echo >&2 "  ERROR: Kernel configuration is invalid.";   \
echo >&2 " include/generated/autoconf.h or include/config/auto.conf are 
missing.";\
echo >&2 " Run 'make oldconfig && make prepare' on kernel src to fix 
it.";  \
echo >&2 ;  \
/bin/false)
mkdir -p 
/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/.tmp_versions
 ; rm -f 
/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/.tmp_versions/*
make -f /usr/src/kernels/3.15.10-201.fc20.x86_64/scripts/Makefile.build 
obj=/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni
  gcc 
-Wp,-MD,/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/.ixgbe_main.o.d
  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.8.3/include 
-I/usr/src/kernels/3.15.10-201.fc20.x86_64/arch/x86/include 
-Iarch/x86/include/generated  
-I/usr/src/kernels/3.15.10-201.fc20.x86_64/include -Iinclude 
-I/usr/src/kernels/3.15.10-201.fc20.x86_64/arch/x86/include/uapi 
-Iarch/x86/include/generated/uapi 
-I/usr/src/kernels/3.15.10-201.fc20.x86_64/include/uapi 
-Iinclude/generated/uapi -include 
/usr/src/kernels/3.15.10-201.fc20.x86_64/include/linux/kconfig.h   
-I/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni
 -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs 
-fno-strict-aliasing -fno-common -Werror-implicit-function-declaration 
-Wno-format-security -fno-delete-null-pointer-checks -O2 -m64 -mno-mmx -mno-sse 
-mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mtune=generic 
-mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args 
-DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 
-DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 
-pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx 
-mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -fstack-protector-strong 
-Wno-unused-but-set-variable -fno-omit-frame-pointer 
-fno-optimize-sibling-calls -fno-var-tracking-assignments -g -pg -mfentry 
-DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign 
-fno-strict-overflow -fconserve-stack -Werror=implicit-int 
-Werror=strict-prototypes -DCC_HAVE_ASM_GOTO   
-I/home/bruce/mbuf_rework/lib/librte_eal/linuxapp/kni --param 
max-inline-insns-single=50   
-I/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/include   
-I/home/bruce/mbuf_rework/lib/librte_eal/linuxapp/kni/ethtool/ixgbe   
-I/home/bruce/mbuf_rework/lib/librte_eal/linuxapp/kni/ethtool/igb -include 
/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/include/rte_config.h -Wall 
-Werror  -DMODULE  -D"KBUILD_STR(s)=#s" 
-D"KBUILD_BASENAME=KBUILD_STR(ixgbe_main)"  
-D"KBUILD_MODNAME=KBUILD_STR(rte_kni)" -c -o 
/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/ixgbe_main.o
 
/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/ixgbe_main.c

 CLANG TARGET COMPILE (MODULE COMPILED USING GCC)
== Build lib/librte_eal/linuxapp/kni
make -C /usr/src/kernels/3.15.10-201.fc20.x86_64 \
KBUILD_SRC=/usr/src/kernels/3.15.10-201.fc20.x86_64 \
KBUILD_EXTMOD="/home/bruce/mbuf_rework/x86_64-native-linuxapp-clang/build/lib/librte_eal/linuxapp/kni"
 -f /usr/src/kernels/3.15.10-201.fc20.x86_64/Makefile \

test -e include/generated/autoconf.h -a -e include/config/auto.conf || (
\
echo 

[dpdk-dev] [PATCH 3/5] testpmd: Change rxfreet default to 32

2014-09-19 Thread Richardson, Bruce


> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Friday, September 19, 2014 11:25 AM
> To: Richardson, Bruce
> Cc: Thomas Monjalon; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/5] testpmd: Change rxfreet default to 32
> 
> On Fri, Sep 19, 2014 at 09:18:26AM +, Richardson, Bruce wrote:
> > > -Original Message-
> > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > Sent: Thursday, September 18, 2014 7:09 PM
> > > To: Thomas Monjalon
> > > Cc: Richardson, Bruce; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 3/5] testpmd: Change rxfreet default to 32
> > >
> > > On Thu, Sep 18, 2014 at 07:13:52PM +0200, Thomas Monjalon wrote:
> > > > 2014-09-18 15:53, Richardson, Bruce:
> > > > > > > --- a/app/test-pmd/testpmd.c
> > > > > > > +++ b/app/test-pmd/testpmd.c
> > > > > > > @@ -225,7 +225,7 @@ struct rte_eth_thresh tx_thresh = {
> > > > > > >  /*
> > > > > > >   * Configurable value of RX free threshold.
> > > > > > >   */
> > > > > > > -uint16_t rx_free_thresh = 0; /* Immediately free RX descriptors 
> > > > > > > by
> > > default. */
> > > > > > > +uint16_t rx_free_thresh = 32; /* Refill RX descriptors once 
> > > > > > > every 32
> > > packets
> > > > > > */
> > > > > > >
> > > > > >
> > > > > > Why 32?  Was that an experimentally determined value?
> > > > > > Does it hold true for all PMD's?
> > > > >
> > > > > This is primarily for the ixgbe PMD, which is right now the most
> > > > > highly tuned driver, but it works fine for all other ones too,
> > > > > as far as I'm aware.
> > > >
> > > > Yes, you are changing this value for all PMDs but you're targetting
> > > > only one.
> > > > These thresholds are dependent of the PMD implementation. There's
> > > > something wrong here.
> > > >
> > > I agree. Its fine to do this, but it does seem like the sample application
> > > should document why it does this and make note of the fact that other PMDs
> > > may
> > > have a separate optimal value.
> > >
> > > > > Basically, this is the minimum setting needed to enable either the
> > > > > bulk alloc or vector RX routines inside the ixgbe driver, so it's
> > > > > best made the default for that reason. Please see
> > > > > "check_rx_burst_bulk_alloc_preconditions()" in ixgbe_rxtx.c, and
> > > > > RX function assignment logic in "ixgbe_dev_rx_queue_setup()" in
> > > > > the same file.
> > > >
> > > > Since this parameter is so important, it could be a default value
> somewhere.
> > > >
> > > > I think we should split generic tuning parameters and tuning parameters
> > > > related to driver implementation or specific hardware.
> > > > Then we should provide some good default values for each of them.
> > > > At last, if needed, applications should be able to easily tune the
> > > > pmd-specific parameters.
> > > >
> > > I like this idea.  I've not got an idea of how much work it is to do so, 
> > > but in
> > > principle it makes sense.
> > >
> > > Perhaps for the immediate need, since rte_eth_rx_queue_setup allows the
> > > config
> > > struct to get passed directly to PMDs, we can create a reserved value
> > > RTE_ETH_RX_FREE_THRESH_OPTIMAL, that instructs the pmd to select
> > > whatever
> > > threshold is optimal for its own hardware?
> > >
> > > Neil
> > >
> > Actually, looking at the code, I would suggest a couple of options, some of
> which may be used together.
> > 1) we make NULL a valid value for the rxconf structure parameter to
> rte_eth_rx_queue_setup. There is little information in it that should really 
> need
> to be passed in by applications to the drivers, and that would allow the 
> drivers to
> be completely free to select the best options for their own operation.
> > 2) As a companion to that (or as an alternative), we could also allow
> each driver to provide its own functions for rte_eth_get_rxconf_default, and
> rte_eth_get_txconf_default, to be used by applications that want to use known-
> good values for thresholds but also want to tweak one of the other values e.g.
> for rx, set the drop_en bit, and for tx set the txqflags to disable offloads.
> > 3) Lastly, we could also consider removing the threshold and other not-
> generally-used values from the rxconf and txconf structures and make those
> removed fields completely driver-set values. Optionally, we could provide an
> alternate API to tune them, but I don't really see this being useful in most 
> cases,
> and I'd probably omit it unless someone can prove a need for such APIs.
> >
> These all sound fairly reasonable to me.
> Neil

Further thinking seems to me like 1 doesn't really go very far, so it falls 
between 2 and 3. Any preference between them?

/Bruce


[dpdk-dev] [PATCH 3/5] testpmd: Change rxfreet default to 32

2014-09-19 Thread Richardson, Bruce
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Thursday, September 18, 2014 7:09 PM
> To: Thomas Monjalon
> Cc: Richardson, Bruce; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/5] testpmd: Change rxfreet default to 32
> 
> On Thu, Sep 18, 2014 at 07:13:52PM +0200, Thomas Monjalon wrote:
> > 2014-09-18 15:53, Richardson, Bruce:
> > > > > --- a/app/test-pmd/testpmd.c
> > > > > +++ b/app/test-pmd/testpmd.c
> > > > > @@ -225,7 +225,7 @@ struct rte_eth_thresh tx_thresh = {
> > > > >  /*
> > > > >   * Configurable value of RX free threshold.
> > > > >   */
> > > > > -uint16_t rx_free_thresh = 0; /* Immediately free RX descriptors by
> default. */
> > > > > +uint16_t rx_free_thresh = 32; /* Refill RX descriptors once every 32
> packets
> > > > */
> > > > >
> > > >
> > > > Why 32?  Was that an experimentally determined value?
> > > > Does it hold true for all PMD's?
> > >
> > > This is primarily for the ixgbe PMD, which is right now the most
> > > highly tuned driver, but it works fine for all other ones too,
> > > as far as I'm aware.
> >
> > Yes, you are changing this value for all PMDs but you're targetting
> > only one.
> > These thresholds are dependent of the PMD implementation. There's
> > something wrong here.
> >
> I agree. Its fine to do this, but it does seem like the sample application
> should document why it does this and make note of the fact that other PMDs
> may
> have a separate optimal value.
> 
> > > Basically, this is the minimum setting needed to enable either the
> > > bulk alloc or vector RX routines inside the ixgbe driver, so it's
> > > best made the default for that reason. Please see
> > > "check_rx_burst_bulk_alloc_preconditions()" in ixgbe_rxtx.c, and
> > > RX function assignment logic in "ixgbe_dev_rx_queue_setup()" in
> > > the same file.
> >
> > Since this parameter is so important, it could be a default value somewhere.
> >
> > I think we should split generic tuning parameters and tuning parameters
> > related to driver implementation or specific hardware.
> > Then we should provide some good default values for each of them.
> > At last, if needed, applications should be able to easily tune the
> > pmd-specific parameters.
> >
> I like this idea.  I've not got an idea of how much work it is to do so, but 
> in
> principle it makes sense.
> 
> Perhaps for the immediate need, since rte_eth_rx_queue_setup allows the
> config
> struct to get passed directly to PMDs, we can create a reserved value
> RTE_ETH_RX_FREE_THRESH_OPTIMAL, that instructs the pmd to select
> whatever
> threshold is optimal for its own hardware?
> 
> Neil
> 
Actually, looking at the code, I would suggest a couple of options, some of 
which may be used together.
1) we make NULL a valid value for the rxconf structure parameter to 
rte_eth_rx_queue_setup. There is little information in it that should really 
need to be passed in by applications to the drivers, and that would allow the 
drivers to be completely free to select the best options for their own 
operation. 
2) As a companion to that (or as an alternative), we could also allow 
each driver to provide its own functions for rte_eth_get_rxconf_default, and 
rte_eth_get_txconf_default, to be used by applications that want to use 
known-good values for thresholds but also want to tweak one of the other values 
e.g. for rx, set the drop_en bit, and for tx set the txqflags to disable 
offloads.
3) Lastly, we could also consider removing the threshold and other 
not-generally-used values from the rxconf and txconf structures and make those 
removed fields completely driver-set values. Optionally, we could provide an 
alternate API to tune them, but I don't really see this being useful in most 
cases, and I'd probably omit it unless someone can prove a need for such APIs.

Regards,
/Bruce


[dpdk-dev] [PATCH 0/4] Add DSO symbol versioning to support backwards compatibility

2014-09-19 Thread Richardson, Bruce
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Thursday, September 18, 2014 8:14 PM
> To: Thomas Monjalon
> Cc: dev at dpdk.org; Richardson, Bruce
> Subject: Re: [PATCH 0/4] Add DSO symbol versioning to support backwards
> compatibility
> 
> On Thu, Sep 18, 2014 at 08:23:36PM +0200, Thomas Monjalon wrote:
> > Hi Neil,
> >
> > 2014-09-15 15:23, Neil Horman:
> > > The DPDK ABI develops and changes quickly, which makes it difficult for
> > > applications to keep up with the latest version of the library, 
> > > especially when
> > > it (the DPDK) is built as a set of shared objects, as applications may be 
> > > built
> > > against an older version of the library.
> > >
> > > To mitigate this, this patch series introduces support for library and 
> > > symbol
> > > versioning when the DPDK is built as a DSO.  Specifically, it does 4 
> > > things:
> > >
> > > 1) Adds initial support for library versioning.  Each library now has a 
> > > version
> > > map that explicitly calls out what symbols are exported to using 
> > > applications,
> > > and assigns version(s) to them
> > >
> > > 2) Adds support macros so that when libraries create incompatible ABI's,
> > > multiple versions may be supported so that applications linked against 
> > > older
> > > DPDK releases can continue to function
> > >
> > > 3) Adds library soname versioning suffixes so that when ABI's must be 
> > > broken
> in
> > > a fashion that requires a rebuild of older applications, they will break 
> > > at load
> > > time, rather than cause unexpected issues at run time.
> > >
> > > 4) Adds documentation for ABI policy, and provides space to document
> deprecated
> > > ABI versions, so that applications might be warned of impending changes.
> > >
> > > With these elements in place the DPDK has some support to allow for the
> extended
> > > maintenence of older API's while still allowing the freedom to develop new
> and
> > > improved API's.
> > >
> > > Implementing this feature will require some additional effort on the part 
> > > of
> > > developers and reviewers.  When reviewing patches, must be checked
> against
> > > existing exports to ensure that the function prototypes are not changing. 
> > >  If
> > > they are, the versioning macros must be used, and the library export map
> should
> > > be updated to reflect the new version of the function.
> > >
> > > When data structures change, if those structures are application 
> > > accessible,
> > > apis that accept or return instances of those data structures should have 
> > > new
> > > versions created so that users of the old data structure version might co-
> exist
> > > at the same time.
> >
> > Thanks for your efforts.
> > But I feel this change has too many constraints for the current status of
> > the DPDK. It's probably too early to adopt such policy.
> >
> I think you may be misunderstanding something.  What constraints do you
> beleive
> that this patch imposes?  Note it doesn't in any way prevent changes to the 
> ABI
> of the DPDK, but rather gives us infrastructure to support multiple ABI
> revisions at the same time, so that applications built against DPDK shared
> libraries can continue to function properly at least for some time until we
> decide to deprecate that ABI level.
> 

I view all this as a positive step. I consider backward compatibility as 
something that should always be encouraged, and I agree with Neil that this 
should allow us to guarantee compatibility for our customers while still having 
a path open to us to change things if we really need to.

> This is all based on the versioning strategy outlined here:
> http://www.akkadia.org/drepper/dsohowto.pdf
> 
> That may help clarify things for you.
> 
> > By the way, this versioning doesn't cover structure changes.
> No, it doesn't.  No link-time mechanism does so.
> 
> > How could it be managed?
> Thats a subject that is open to discussion, but my initial thinking is that we
> need to handle it on a case by case basis:
> 
> * For minor updates, where allocation of a structure is done on the heap and
> new
> fields need to be added, appending them to the end of a structure and 
> providing
> an initial value is sufficient.
> 
> * For major changes, where fields need to be removed, or re-arranged, mostly
> likely the API surfaces which a

[dpdk-dev] [PATCH] ixgbe: fix compile error with gcc4.4 (used RHEL 6)

2014-09-18 Thread Richardson, Bruce
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Thursday, September 18, 2014 4:51 PM
> To: Thomas Monjalon
> Cc: Richardson, Bruce; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix compile error with gcc4.4 (used 
> RHEL
> 6)
> 
> On Thu, Sep 18, 2014 at 05:38:58PM +0200, Thomas Monjalon wrote:
> > Hi Neil,
> >
> > 2014-09-18 11:03, Neil Horman:
> > > Thank you, I've reproduced the problem here, and you're right, it is a
> > > compiler bug, but I really hate the idea of just throwing braces around
> > > something to shut the compiler up, especially when the compiler has since
> > > been fixed.  Looking at it a bit more closely it seems like the the thing
> > > to do is actually just consistently use rte_mbuf_refcnt_set, since thats
> > > what the rte_mbuf header documentation says to do, and protects you if the
> > > internal structure changes, as well as prevents you from having to spread
> > > ifdef RTE_MBUF_REFCNT all over the place.
> > >
> > > This patch does a bit more than that too.  With a little creative
> > > typedef-ing we don't need the anonymous union at all, and lets us just use
> > > a single refcnt variable, and I think eliminate that odd refcnt_reserved
> > > member of the union, that, as far as I can see, does nothing.
> >
> > > --- a/config/common_linuxapp
> > > +++ b/config/common_linuxapp
> > > @@ -123,7 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
> > >  CONFIG_RTE_LIBEAL_USE_HPET=n
> > >  CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
> > >  CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
> > > -CONFIG_RTE_EAL_IGB_UIO=y
> > > +CONFIG_RTE_EAL_IGB_UIO=n
> > >  CONFIG_RTE_EAL_VFIO=y
> >
> > Hum, indeed this patch does a bit more ;)
> >
> Sorry, meant to clip that out, I was building without kernel headers, so I had
> to disable igb_uio and kni.  I'll propose this patch properly if theres
> consensus on the valid bits.
> Neil

If we get rid of the REFCNT compile-time option (but not the REFCNT_ATOMIC 
one), then we should be able to get rid of the union, as we can do the 
atomic/non-atomic entirely inside the refcnt manipulation routines (since a 
regular uint16_t can be typecast directly to an atomic form of the same). Once 
we get rid of the union we can get rid of the extra braces that go along with 
the union, and we can just have the static initialiser cleanly put in the code.
Whaddaya think?

/Bruce


[dpdk-dev] [PATCH 3/5] testpmd: Change rxfreet default to 32

2014-09-18 Thread Richardson, Bruce
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Wednesday, September 17, 2014 4:30 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/5] testpmd: Change rxfreet default to 32
> 
> On Wed, Sep 17, 2014 at 11:01:40AM +0100, Bruce Richardson wrote:
> > To improve performance by using bulk alloc or vectored RX routines, we
> > need to set rx free threshold (rxfreet) value to 32, so make this the
> > testpmd default.
> >
> > Signed-off-by: Bruce Richardson 
> > ---
> >  app/test-pmd/testpmd.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> > index 9f6cdc4..5751607 100644
> > --- a/app/test-pmd/testpmd.c
> > +++ b/app/test-pmd/testpmd.c
> > @@ -225,7 +225,7 @@ struct rte_eth_thresh tx_thresh = {
> >  /*
> >   * Configurable value of RX free threshold.
> >   */
> > -uint16_t rx_free_thresh = 0; /* Immediately free RX descriptors by 
> > default. */
> > +uint16_t rx_free_thresh = 32; /* Refill RX descriptors once every 32 
> > packets
> */
> >
> 
> Why 32?  Was that an experimentally determined value?  Does it hold true for 
> all
> PMD's?

This is primarily for the ixgbe PMD, which is right now the most highly tuned 
driver, but it works fine for all other ones too, as far as I'm aware. 
Basically, this is the minimum setting needed to enable either the bulk alloc 
or vector RX routines inside the ixgbe driver, so it's best made the default 
for that reason. Please see " check_rx_burst_bulk_alloc_preconditions()" in 
ixgbe_rxtx.c, and RX function assignment logic in "ixgbe_dev_rx_queue_setup()" 
in the same file. 

/Bruce



[dpdk-dev] [PATCH] ixgbe: fix compile error with gcc4.4 (used RHEL 6)

2014-09-18 Thread Richardson, Bruce
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Thursday, September 18, 2014 1:25 PM
> To: Thomas Monjalon
> Cc: Richardson, Bruce; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix compile error with gcc4.4 (used 
> RHEL
> 6)
> 
> On Thu, Sep 18, 2014 at 01:09:16PM +0200, Thomas Monjalon wrote:
> > > The refcnt field is contained within an anonymous union within the mbuf
> > > data structure, and gcc 4.4 gives an error about an unknown field unless
> > > the initialiser for the field is contained within extra braces.
> > >
> > > Signed-off-by: Bruce Richardson 
> >
> > Acked-by: Thomas Monjalon 
> >
> > Thanks Bruce, it is now applied.
> 
> Hang on here, we use anonymous unions all the time in RHEL6, and make
> assignments to them frequently, and the compiler doesn't complain (see the
> dropcount variable in sk_buff for an example).  Not saying that this is a big
> deal, but can you explain a little more about what you're seeing when this 
> error
> occurs, before we just paper over it?
> 

Originally reported on RHEL6 as a build failure. When I use gcc4.4 on Fedora 
20, I get the following without this change:

  CC ixgbe_rxtx_vec.o
== Build lib/librte_table
/home/bruce/dpdk.org/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c: In function 
'ixgbe_rxq_vec_setup':
/home/bruce/dpdk.org/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c:726: error: unknown 
field 'refcnt' specified in initializer
compilation terminated due to -Wfatal-errors.
make[5]: *** [ixgbe_rxtx_vec.o] Error 1
make[4]: *** [librte_pmd_ixgbe] Error 2
make[4]: *** Waiting for unfinished jobs
make[3]: *** [lib] Error 2
make[2]: *** [all] Error 2
make[1]: *** [x86_64-native-linuxapp-gcc_install] Error 2
make: *** [install] Error 2

Regards,
/Bruce


[dpdk-dev] [PATCH 4/5] mbuf: add userdata pointer field

2014-09-17 Thread Richardson, Bruce
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Wednesday, September 17, 2014 4:35 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 4/5] mbuf: add userdata pointer field
> 
> On Wed, Sep 17, 2014 at 11:01:41AM +0100, Bruce Richardson wrote:
> > While some applications may store metadata about packets in the packet
> > mbuf headroom, this is not a workable solution for packet metadata which
> > is either:
> > * larger than the headroom (or headroom is needed for adding pkt headers)
> > * needs to be shared or copied among packets
> >
> > To support these use cases in applications, we reserve a general
> > "userdata" pointer field inside the second cache-line of the mbuf. This
> > is better than having the application store the pointer to the external
> > metadata in the packet headroom, as it saves an additional cache-line
> > from being used.
> >
> > Apart from storing metadata, this field also provides a general 8-byte
> > scratch space inside the mbuf for any other application uses that are
> > applicable.
> >
> > Signed-off-by: Bruce Richardson 
> > ---
> >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 3 ++-
> >  lib/librte_mbuf/rte_mbuf.h| 3 +++
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > index 25ed672..d27e891 100644
> > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > @@ -117,7 +117,8 @@ struct rte_kni_mbuf {
> > uint16_t data_len;  /**< Amount of data in segment buffer. */
> > uint32_t pkt_len;   /**< Total pkt len: sum of all segment 
> > data_len. */
> > char pad3[8];
> > -   void *pool __attribute__((__aligned__(64)));
> > +   void *pad4 __attribute__((__aligned__(64)));
> > +   void *pool;
> I don't see a comment about this in the changelog, only about the userdata
> pointer being added below.

Yes, this is the userdata pointer - just added as padding here, since it's not 
actually needed by the kernel-side KNI module.

> 
> 
> > void *next;
> >  };
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 8e27d2e..b1acfc3 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -172,6 +172,9 @@ struct rte_mbuf {
> >
> > /* second cache line - fields only used in slow path or on TX */
> > MARKER cacheline1 __rte_cache_aligned;
> > +
> > +   void *userdata;   /**< Can be used for external metadata */
> > +
> Do you want to make this a void* or a char[8]?  I ask because if people are
> going to use is as a scratch space (rather than a pointer), they get a suprise
> when they build this on 32 bit systems, and their 8 byte scratch space is
> reduced to 4 bytes.

I think this is better as a pointer, as that is how it is likely to be used. As 
for 32-bit, I'm torn between wanting to just update the comment and feeling the 
need to update the code to actually make the thing 8-byte on 32-bit! Changing 
the comment to be more accurate is easier, unions are ugly looking in the 
structure IMHO, so maybe I'll just mark the following field (pool) as always 
8-byte aligned

/Bruce

> 
> Neil
> 
> > struct rte_mempool *pool; /**< Pool from which mbuf was allocated.
> */
> > struct rte_mbuf *next;/**< Next segment of scattered packet. */
> >
> > --
> > 1.9.3
> >
> >


[dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx perf

2014-09-17 Thread Richardson, Bruce

> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Wednesday, September 17, 2014 4:21 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path 
> tx
> perf
> 
> On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote:
> > Make a small improvement to slow path TX performance by adding in a
> > prefetch for the second mbuf cache line.
> > Also move assignment of l2/l3 length values only when needed.
> >
> > Signed-off-by: Bruce Richardson 
> > ---
> >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++-
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > index 6f702b3..c0bb49f 100644
> > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts,
> > ixgbe_xmit_cleanup(txq);
> > }
> >
> > +   rte_prefetch0(>mbuf->pool);
> > +
> 
> Can you explain what all of these prefetches are doing?  It looks to me like
> they're just fetching the first caheline of the mempool structure, which it
> appears amounts to the pools name.  I don't see that having any use here.
> 
This does make a decent enough performance difference in my tests (the amount 
varies depending on the RX path being used by testpmd). 

What I've done with the prefetches is two-fold:
1) changed it from prefetching the mbuf (first cache line) to prefetching the 
mbuf pool pointer (second cache line) so that when we go to access the pool 
pointer to free transmitted mbufs we don't get a cache miss. When clearing the 
ring and freeing mbufs, the pool pointer is the only mbuf field used, so we 
don't need that first cache line.
2) changed the code to prefetch earlier - in effect to prefetch one mbuf ahead. 
The original code prefetched the mbuf to be freed as soon as it started 
processing the mbuf to replace it. Instead now, every time we calculate what 
the next mbuf position is going to be we prefetch the mbuf in that position 
(i.e. the mbuf pool pointer we are going to free the mbuf to), even while we 
are still updating the previous mbuf slot on the ring. This gives the prefetch 
much more time to resolve and get the data we need in the cache before we need 
it.

Hope this clarifies things.

/Bruce


[dpdk-dev] Patch merges for 1.8 release

2014-09-16 Thread Richardson, Bruce
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, September 16, 2014 3:20 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: Patch merges for 1.8 release
> 
> Hi Bruce,
> 
> 2014-09-16 11:18, Bruce Richardson:
> > I'm just wondering what the expected timeline is for you starting to merge
> patches in for the 1.8 release? As you have
> > probably seen, I have quite a number of outstanding patches submitted for
> mbufs, and I'm wonder when the "part 1" set,
> > which by now has been pretty well reviewed and ack'ed, might be merged to
> the mainline? I'm loath to start upstreaming
> > drafts of part 3 patches at this stage, because the depth of the backlog I'm
> building up just increases the amount
> > of work I have to do for any mainline changes that require me to do changes
> on rebase.
> 
> I was very busy last week with DPDK summit and IDF.
> Now I'm back and I'm cleaning up the patchwork tool (before availability
> announcement in a couple of days).
> Then the goal of the next days is to make a -rc1 tag with all the patches
> which are ready and properly reviewed.
> The mbuf changes are first in my list.
> 

Thanks for the info, and welcome back!

/Bruce


[dpdk-dev] [PATCH v2 02/13] mbuf: reorder fields by time of use

2014-09-15 Thread Richardson, Bruce
> -Original Message-
> From: Liu, Jijiang
> Sent: Monday, September 15, 2014 8:12 AM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 02/13] mbuf: reorder fields by time of use
> 
> Hi Bruce,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Thursday, September 11, 2014 9:16 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2 02/13] mbuf: reorder fields by time of use
> >
> > *  Reorder the fields in the mbuf so that we have fields that are used 
> > together
> > side-by-side in the structure. This means that we have a contiguous block of
> > 8-bytes in the mbuf which are used to reset an mbuf of descriptor rearm, 
> > and a
> > block of 16-bytes of data (excluding flags) which are set on RX from the
> > received packet descriptor.
> > * Use dummy fields as appropriate to ensure alignment or to reserve gaps for
> > later field additions.
> > * Place most items which are not used by fast-path RX separately at the end 
> > of
> > the structure so they can later be moved to a separate cache line.
> > [The l2/l3 length fields are not moved at this stage as doing so will cause
> > overflow to the next cache line].
> >
> > Updated in V2:
> > * Updated the KNI buffer structure to match that of new mbuf
> > * Cleaned up commit message typos.
> >
> > Signed-off-by: Bruce Richardson 
> > ---
> >  .../linuxapp/eal/include/exec-env/rte_kni_common.h | 12 ++-
> >  lib/librte_mbuf/rte_mbuf.h | 25
> > --
> >  2 files changed, 21 insertions(+), 16 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > index 07908ac..f2b502c 100644
> > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > @@ -108,15 +108,17 @@ struct rte_kni_fifo {
> >   * Padding is necessary to assure the offsets of these fields
> >   */
> >  struct rte_kni_mbuf {
> > -   void *pool;
> > void *buf_addr;
> > -   char pad0[16];
> > -   void *next;
> > +   char pad0[10];
> > uint16_t data_off;  /**< Start address of data in segment buffer. */
> > +   char pad1[4];
> > +   uint16_t ol_flags;  /**< Offload features. */
> > +   char pad2[8];
> > uint16_t data_len;  /**< Amount of data in segment buffer. */
> > uint32_t pkt_len;   /**< Total pkt len: sum of all segment data_len.
> > */
> > -   char pad2[4];
> > -   uint16_t ol_flags;  /**< Offload features. */
> > +   char pad3[8];
> > +   void *pool;
> > +   void *next;
> >  } __attribute__((__aligned__(64)));
> >
> >  /*
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index
> > 71080e5..413a5a1 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -115,16 +115,12 @@ extern "C" {
> >   * The generic rte_mbuf, containing a packet mbuf.
> >   */
> >  struct rte_mbuf {
> > -   struct rte_mempool *pool; /**< Pool from which mbuf was allocated.
> */
> > void *buf_addr;   /**< Virtual address of segment buffer. */
> > phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */
> > -   uint16_t buf_len; /**< Length of segment buffer. */
> >
> > -   /* valid for any segment */
> > -   struct rte_mbuf *next;/**< Next segment of scattered packet. */
> > +   /* next 8 bytes are initialised on RX descriptor rearm */
> > +   uint16_t buf_len; /**< Length of segment buffer. */
> > uint16_t data_off;
> > -   uint16_t data_len;/**< Amount of data in segment buffer. */
> > -   uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
> >
> >  #ifdef RTE_MBUF_REFCNT
> > /**
> > @@ -142,14 +138,17 @@ struct rte_mbuf {
> >  #else
> > uint16_t refcnt_reserved; /**< Do not use this field */
> >  #endif
> > -   uint16_t reserved;/**< Unused field. Required for padding
> > */
> > -   uint16_t ol_flags;/**< Offload features. */
> > -
> > -   /* these fields are valid for first segment only */
> > uint8_t nb_segs;/**< Number of segments. */
> > uint8_t port;   /**< Input port. */
> >
> &

[dpdk-dev] [PATCH] librte_log: add function to retrieve log_level

2014-09-15 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matthew Hall
> Sent: Sunday, September 14, 2014 9:35 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] librte_log: add function to retrieve log_level
> 
> Signed-off-by: Matthew Hall 
Acked-by: Bruce Richardson 

> ---
>  lib/librte_eal/common/eal_common_log.c  | 7 +++
>  lib/librte_eal/common/include/rte_log.h | 6 ++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/lib/librte_eal/common/eal_common_log.c
> b/lib/librte_eal/common/eal_common_log.c
> index e4df0b9..d979f28 100644
> --- a/lib/librte_eal/common/eal_common_log.c
> +++ b/lib/librte_eal/common/eal_common_log.c
> @@ -176,6 +176,13 @@ rte_set_log_level(uint32_t level)
>   rte_logs.level = (uint32_t)level;
>  }
> 
> +/* Get global log level */
> +uint32_t
> +rte_get_log_level()
> +{
> + return rte_logs.level;
> +}
> +
>  /* Set global log type */
>  void
>  rte_set_log_type(uint32_t type, int enable)
> diff --git a/lib/librte_eal/common/include/rte_log.h
> b/lib/librte_eal/common/include/rte_log.h
> index 565415a..7f1c2f9 100644
> --- a/lib/librte_eal/common/include/rte_log.h
> +++ b/lib/librte_eal/common/include/rte_log.h
> @@ -130,6 +130,12 @@ int rte_openlog_stream(FILE *f);
>  void rte_set_log_level(uint32_t level);
> 
>  /**
> + * Get the global log level.
> + *
> + */
> +uint32_t rte_get_log_level(void);
> +
> +/**
>   * Enable or disable the log type.
>   *
>   * @param type
> --
> 1.9.1



[dpdk-dev] [PATCH 0/3] eal affinitize low priority threads to lcore 0

2014-09-12 Thread Richardson, Bruce
> -Original Message-
> From: Hiroshi Shimamoto [mailto:h-shimamoto at ct.jp.nec.com]
> Sent: Friday, September 12, 2014 12:48 AM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 0/3] eal affinitize low priority threads to 
> lcore 0
> 
> Hi Bruce,
> 
> > Subject: [dpdk-dev] [PATCH 0/3] eal affinitize low priority threads to 
> > lcore 0
> >
> > This patchset sets things up so that we can affinitize the interrupt,
> > vfio management, and hpet timer management threads to lcore 0, so that
> > they never interfere with data plane threads.
> 
> I don't think it works well always.
> The management threads can be floating all cpus on demand, because those
> threads are created before the master thread affinity is set. The kernel
> scheduler will take care of it. And we should isolate cpus which data plane
> threads are pinned to, so the management threads cannot run on those isolated
> cpus data plane thread run.

Yes, this was my understanding as well of how things would work - we just had a 
report back of where the management threads were using the same core as the 
data-plane despite the core being isolated. I'll double-check to see if this 
really is happening and resubmit a new patch with a configurable core 
assignment if it is definitely needed.

/Bruce

> In some cases, the user may run data plane thread on lcore 0, but with
> this patchset the data plane pinned to lcore 0 always run with the
> management threads. That doesn't seem good.
> 
> I think this functionality should be conditional.
> How about to add a parameter to specify the mask for the management threads
> instead of statically assignment to lcore 0?
> 
> thanks,
> Hiroshi
> 
> >
> > Bruce Richardson (3):
> >   eal: add core id param to  eal_thread_set_affinity
> >   eal: increase scope of eal_thread_set_affinity
> >   eal: affinitize low-priority threads to lcore 0
> >
> >  lib/librte_eal/bsdapp/eal/eal_thread.c | 12 ++--
> >  lib/librte_eal/common/include/eal_private.h| 10 ++
> >  lib/librte_eal/linuxapp/eal/eal_interrupts.c   |  5 +
> >  lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c |  6 ++
> >  lib/librte_eal/linuxapp/eal/eal_thread.c   | 12 ++--
> >  lib/librte_eal/linuxapp/eal/eal_timer.c|  5 +
> >  6 files changed, 38 insertions(+), 12 deletions(-)
> >
> > --
> > 1.9.3



[dpdk-dev] dpdk starting issue with descending virtual address allocation in new kernel

2014-09-11 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michael Hu (NSBU)
> Sent: Wednesday, September 10, 2014 11:41 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] dpdk starting issue with descending virtual address
> allocation in new kernel
> 
> Hi All,
> 
> We have a kernel config question to consult you.
> DPDK failed to start due to mbuf creation issue with new kernel 3.14.17 +
> grsecurity patches.
> We tries to trace down the issue, it seems that  the virtual address of huge 
> page
> is allocated from high address to low address by kernel where dpdk expects it 
> to
> be low to high to think it is as consecutive. See dumped virt address bellow. 
> It is
> first 0x71042140, then 0x71042120. Where previously it would be
> 0x71042120 first , then 0x71042140. But they are still consecutive.
> 
> Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 00:0c:29:b3:30:db
> Create: Default RX  0:0  - Memory used (MBUFs 4096 x (size 1984 + Hdr 
> 64)) +
> 790720 =   8965 KB
> Zone 0: name:, phys:0x6ac0, len:0x2080,
> virt:0x71042140, socket_id:0, flags:0
> Zone 1: name:, phys:0x6ac02080, len:0x1d10c0,
> virt:0x710421402080, socket_id:0, flags:0
> Zone 2: name:, phys:0x6ae0, len:0x16,
> virt:0x71042120, socket_id:0, flags:0
> Zone 3: name:, phys:0x6add3140, len:0x11a00,
> virt:0x7104215d3140, socket_id:0, flags:0
> Zone 4: name:, phys:0x6ade4b40, len:0x300,
> virt:0x7104215e4b40, socket_id:0, flags:0
> Zone 5: name:, phys:0x6ade4e80, len:0x200,
> virt:0x7104215e4e80, socket_id:0, flags:0
> Zone 6: name:, phys:0x6ade5080, len:0x10080,
> virt:0x7104215e5080, socket_id:0, flags:0
> Segment 0: phys:0x6ac0, len:2097152, virt:0x71042140, socket_id:0,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 1: phys:0x6ae0, len:2097152, virt:0x71042120, socket_id:0,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 2: phys:0x6b00, len:2097152, virt:0x71042100, socket_id:0,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 3: phys:0x6b20, len:2097152, virt:0x710420e0, socket_id:0,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 4: phys:0x6b40, len:2097152, virt:0x710420c0, socket_id:0,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 5: phys:0x6b60, len:2097152, virt:0x710420a0, socket_id:0,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 6: phys:0x6b80, len:2097152, virt:0x71042080, socket_id:0,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 7: phys:0x6ba0, len:2097152, virt:0x71042060, socket_id:0,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 8: phys:0x6bc0, len:2097152, virt:0x71042040, socket_id:0,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 9: phys:0x6be0, len:2097152, virt:0x71042020, socket_id:0,
> hugepage_sz:2097152, nchannel:0, nrank:0
> ---
> 
> 
> 
> 
> 
> Related dpdk code is in
> dpdk/lib/librte_eal/linuxapp/eal/eal_memory.c  :: rte_eal_hugepage_init()
> for (i = 0; i < nr_hugefiles; i++) {
> new_memseg = 0;
> 
> /* if this is a new section, create a new memseg */
> if (i == 0)
> new_memseg = 1;
> else if (hugepage[i].socket_id != hugepage[i-1].socket_id)
> new_memseg = 1;
> else if (hugepage[i].size != hugepage[i-1].size)
> new_memseg = 1;
> else if ((hugepage[i].physaddr - hugepage[i-1].physaddr) !=
> hugepage[i].size)
> new_memseg = 1;
> else if (((unsigned long)hugepage[i].final_va -
> (unsigned long)hugepage[i-1].final_va) != hugepage[i].size) {
> new_memseg = 1;
> }
> 
> 
> 
> Is this a known issue? Is there any workaround? Or Could you advise which
> kernel config may relate this this kernel behavior change?
> 
> Thanks,
> Michael

This should not be a problem for Intel DPDK startup, as the EAL should take 
care of mmaps being done in this order at startup. 
By way of background, where I have seen this occur before is on 32-bit systems, 
while 64-bit systems tend to mmap in ascending order in every case I've looked 
at. [Q: is this 32-bit you are running, or 64-bit]. In either case, we modified 
the EAL memory mapping code some time back to try and take account of this - if 
you look in map_all_hugepages function in eal_memory.c, you will see that, when 
we go to do the second mapping of hugepages to line the pages up, we do so by 
explicitly specifying our preferred address. We get this address by allocating 
a large block of memory from /dev/zero, taking the address and then freeing it 
again. Then we map the pages one-at-a-time into that free address block, so 
that even when the kernel wants to map things from hi to lo address, our 
address hints still cause things to map from lo to hi. If this does not work 
for you, I'd be curious to find out why. Do any of the security patches you 
have applied prevent mmap address hinting from working, for instance?

Regards,

[dpdk-dev] l2fwd does not send packets

2014-09-11 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xin Li
> Sent: Wednesday, September 10, 2014 10:19 PM
> To: dev
> Subject: [dpdk-dev] l2fwd does not send packets
> 
> Hi,
> 
> The l2fwd sample application in my environment does not send packets
> through the TX port. I run DPDK inside a KVM guest. The NIC ports are VFs
> assigned to the VM by pci passthrough.
> 
> Environment:
> 
> Host OS: ubuntu 14.04 x86_64
> NIC: intel x540-t1
> Guest OS: ubuntu 14.04 x86_64
> DPDK: v1.7.0
> 
> Some findings:
> 
> 1. l2fwd reports 511 packets sent when max tx descriptor is 512, The number
> changes to 1023 if the max tx descriptor is set to 1024.
> 
> 2. On the receiver side, no packet captured.
> 
> Anyone know the issue and the corresponding fix? Thanks.
> 

One thing that is worth checking is that the source mac address of the packets 
being sent matches that of the VF interface, as the packets can't be sent 
otherwise.

/Bruce


[dpdk-dev] Testing vmdq sample application

2014-09-11 Thread Richardson, Bruce
Also check that the mac addresses are configured correctly on the packets being 
sent. In vmdq_dcb mode, the nic doesn't act in promiscuous mode picking up all 
packet irrespective of MAC address, so the destination mac must match that of 
the NIC port.

/Bruce

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wei-Chun Chao
> Sent: Wednesday, September 10, 2014 8:49 PM
> To: ANKIT BATRA
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Testing vmdq sample application
> 
> Is your traffic VLAN tagged?
> 
> I think vmdq_app has "conf.enable_default_pool = 0;" so untagged
> traffic will be dropped.
> 
> Thanks,
> Weichun
> 
> On Wed, Sep 10, 2014 at 12:00 PM, ANKIT BATRA 
> wrote:
> > Hi,
> >
> > I have started the application on my host machine.And from another terminal
> > on host machine gave sudo killall -HUP vmdq_dcb_app and sent packets on
> NIC
> > card from another machine .But on terminal where vmdq application is
> > running, I am seeing that no packets are coming.All rows and columns are
> > 0.And where will the VMs will come into picture here for testing this.
> > Please suggest and correct me if i am doing anything incorrect.
> >
> > On Wed, Sep 10, 2014 at 6:24 AM, Ouyang, Changchun <
> > changchun.ouyang at intel.com> wrote:
> >
> >> Hi
> >>
> >> Firstly compiling the application
> >> 1. Go to the examples directory:
> >> export RTE_SDK=/path/to/rte_sdk
> >> cd ${RTE_SDK}/examples/vmdq
> >> 2. Set the target (a default target is used if not specified). For example:
> >> export RTE_TARGET=x86_64-native-linuxapp-gcc
> >> See the Intel? DPDK Getting Started Guide for possible RTE_TARGET values.
> >> 3. Build the application:
> >> make
> >>
> >> Then running the application
> >> To run the example in a linuxapp environment:
> >> user at target:~$ ./build/vmdq_app -c f -n 4 -- --nb-pools 8
> >>
> >> If you use 1G NIC, 8 pools are available,
> >> If you use 10G NIC, 64 pools are available,
> >>
> >> At last, send packets with vlan tag to select a pool.
> >>
> >> The vlan tag and pool has the following mapping:
> >> const uint16_t vlan_tags[] = {
> >> 0,  1,  2,  3,  4,  5,  6,  7,// It occupies pool 0 ~ pool
> >> 7, one for each
> >> 8,  9, 10, 11,  12, 13, 14, 15, // It occupies pool 8 ~ pool
> >> 15, one for each
> >> 16, 17, 18, 19, 20, 21, 22, 23,// It occupies pool 16 ~ pool
> >> 23, one for each
> >> 24, 25, 26, 27, 28, 29, 30, 31,// It occupies pool 24 ~ pool
> >> 31, one for each
> >> 32, 33, 34, 35, 36, 37, 38, 39,// It occupies pool 32 ~ pool
> >> 39, one for each
> >> 40, 41, 42, 43, 44, 45, 46, 47,// It occupies pool 40 ~ pool
> >> 47, one for each
> >> 48, 49, 50, 51, 52, 53, 54, 55,// It occupies pool 48 ~ pool
> >> 55, one for each
> >> 56, 57, 58, 59, 60, 61, 62, 63,// It occupies pool 56 ~ pool
> >> 63, one for each
> >> };
> >>
> >> Thanks
> >> Changchun
> >>
> >> > -Original Message-
> >> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of ANKIT BATRA
> >> > Sent: Wednesday, September 10, 2014 3:23 AM
> >> > To: dev at dpdk.org
> >> > Subject: [dpdk-dev] Testing vmdq sample application
> >> >
> >> > Hi,
> >> >
> >> >  I am trying to run vmdq sample application.But not getting how to test
> >> > this.Can anyone please help with detailed procedure how to test this
> >> sample
> >> > application.
> >> > --
> >> > Regards
> >> > Ankit Batra
> >>
> >
> >
> >
> > --
> > Regards
> > Ankit Batra


[dpdk-dev] Defaults for rte_hash

2014-09-09 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matthew Hall
> Sent: Tuesday, September 09, 2014 11:32 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Defaults for rte_hash
> 
> Hello,
> 
> I was looking at the code which inits rte_hash objects in examples/l3fwd. It's
> using approx. 1M to 4M hash 'entries' depending on 32-bit vs 64-bit, but it's
> setting the 'bucket_entries' to just 4.
> 
> Normally I'm used to using somewhat deeper hash buckets than that... it seems
> like having a zillion little tiny hash buckets would cause more TLB pressure
> and memory overhead... or does 4 get shifted / exponentiated into 2**4 ?
> 
> The documentation in
> http://dpdk.org/doc/api/structrte__hash__parameters.html
> and http://dpdk.org/doc/api/rte__hash_8h.html isn't that clear... is there a
> better place to look for this?
> 
> In my case I'm looking to create a table of 4M or 8M entries, containing
> tables of security threat IPs / domains, to be detected in the traffic. So it
> would be good to have some understanding how not to waste a ton of memory
> on a
> table this huge without making it run super slow either.
> 
> Did anybody have some experience with how to get this right?

It might be worth looking too at the hash table structures in the librte_table 
directory for packet framework. These should give better scalability across 
millions of flows than the existing rte_hash implementation. [We're looking 
here to provide in the future a similar, more scalable, hash table 
implementation with an API like that of rte_hash, but that is still under 
development here at the moment.]

> 
> Another thing... the LPM table uses 16-bit Hop IDs. But I would probably have
> more than 64K CIDR blocks of badness on the Internet available to me for
> analysis. How would I cope with this, besides just letting some attackers
> escape unnoticed? ;)

Actually, I think the next hop field in the lpm implementation is only 8-bits, 
not 16 :-). Each lpm entry is only 16-bits in total.

> 
> Have we got some kind of structure which allows a greater number of CIDRs
> even
> if it's not quite as fast?
> 
> Thanks,
> Matthew.


[dpdk-dev] [PATCH 03/13] mbuf: add packet_type field

2014-09-09 Thread Richardson, Bruce


> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, September 09, 2014 9:03 AM
> To: Zhang, Helin; Yerden Zhumabekov; Richardson, Bruce; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 03/13] mbuf: add packet_type field
> 
> Hello,
> 
> On 09/09/2014 05:59 AM, Zhang, Helin wrote:
> > It is a common field which i40e PMD will use it to store the 'packet type 
> > ID'.
> i40e
> > hardware can recognize more than a hundred of packet types of received
> packets,
> > this is quite useful for upper layer stack or application. So this field is 
> > quite
> useful
> > and will be filled by PMD.
> > In ixgbe/igb, it has less than 10 packet types which are marked in offload 
> > flags.
> From
> > now on, it would be better to have new field here to put the hardware
> offloaded
> > packet type in and it could be used for future NICs.
> >
> >>
> >> I'm not saying this field is useless. But even if it's useful for some 
> >> applications
> >> like yours, it does not mean that it should go in the generic mbuf 
> >> structure.
> >>
> >> Also, for a new field, we should define who is in charge of filling it.
> >> Is is the driver? Does it mean that all drivers have to be modified to 
> >> fill it? Or
> is
> >> it just a placeholder for applications? In this case, shouldn't we use
> >> application-specific metadata? In the other direction (TX), we would also
> need
> >> to define if this field must be filled by the application before 
> >> transmitting a
> mbuf
> >> to a driver.
> > Yes, PMD will fill it. I40e PMD will be the first one, ixgbe/igb can be 
> > kept as it
> is, or
> > modified to be consistent. It is used for RX side only, and for TX side, it 
> > can be
> > investigated to see if it can be used also. I think some new features in
> development
> > can think of that.
> > Anyway, it is a quite useful field for i40e and future generation of NICs.
> 
> To me, having the support in a hardware for that feature is not a
> sufficient reason for adding this field. There are many hardware
> features that will never be integrated in dpdk.
> 
> This first version of the patch:
> 
> - just adds a field that is not used by any code, so it is useless.
>At least testpmd or an application example should show how to
>use it.
> 
> - does not describe what enhancement is provided by adding the
>field (performance? in this case, numbers + use case would help
>to convince people).
> 
> - does not describe what can be the content of the field. Is it
>a protocol number?
> 
> - does not explain if all drivers must fill this field. If yes,
>the patch has to update all drivers. If not, something must be
>done to mark the packet field as unknown by default.
> 
> Regards,
> Olivier

Hi,

Points taken. Really, this patch doesn't belong in this set as I had planned 
things and better belongs in patch set 3 (coming soon, I hope) which should 
propose the new field additions. I simply put it here to avoid having to start 
renumbering and renaming reserved fields in the structure, but that is possibly 
the lesser of the two evils.

However, with regards to adding new fields in, I would like to have some 
agreement that I can add fields in without actually pushing in the patch to use 
them - so long as sufficient rational is provided for using the field and there 
is a soon pending change to actually use it. This patch did not meet the 
criteria for explanation, but if updated to do so, I would like to have this 
patch accepted on the basis of that explanation so as to enable those working 
on the drivers to make us of it. 

Regards,
/Bruce


[dpdk-dev] [PATCH 0/6] Mbuf structure Rework, part 1

2014-09-09 Thread Richardson, Bruce


> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, September 08, 2014 1:33 PM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/6] Mbuf structure Rework, part 1
> 
> Hi Bruce,
> 
> On 08/27/2014 05:50 PM, Bruce Richardson wrote:
> > This patch set does some initial pre-work to prepare the mbuf data structure
> > (and ixgbe vector driver to a lesser extent) for more major changes which
> > will follow on in a subsequent patch set. [See previous RFC patch set for
> > more indications of the future coming changes].
> >
> > The main changes here are the flattening out of the mbuf data structure, 
> > with
> > much of it based off work by Olivier. The ctrlmbuf and pktmbuf structures 
> > are
> > now gone, as is the vlan_macip structure. However, in this set, the concept
> > of having a separate ctrl mbuf type is kept around. The plan is in a later 
> > set
> > when we expand the flags field to 64-bits, we can use a single bit in the 
> > flags
> > to indicate a control packet. For now, though, the ctrlmbuf functions and
> macros
> > just are aliases for the pktmbuf equivalents as much as possible.
> 
> I'm wondering it "struct rte_kni_mbuf" should be updated
> accordingly each time "struct mbuf" is modified.
> 
> Regards,
> Olivier

Yes, it should. There are no changes needed for the part 1 patch set (6 
patches), as it does not change the actual field layout, it just flattens the C 
definitions. I've already started working on an update to the second patch set 
(13 patches) including kni changes as part of it.

/Bruce


[dpdk-dev] [PATCH 07/13] mbuf: use macros only to access the mbuf metadata

2014-09-09 Thread Richardson, Bruce
> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, September 08, 2014 1:06 PM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 07/13] mbuf: use macros only to access the
> mbuf metadata
> 
> Hi Bruce,
> 
> On 09/03/2014 05:49 PM, Bruce Richardson wrote:
> > Removed the explicit zero-sized metadata definition at the end of the
> > mbuf data structure. Updated the metadata macros to take account of this
> > change so that all existing code which uses those macros still works.
> >
> > Signed-off-by: Bruce Richardson 
> > ---
> >  lib/librte_mbuf/rte_mbuf.h | 22 --
> >  1 file changed, 8 insertions(+), 14 deletions(-)
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 5260001..ca66d9a 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -166,31 +166,25 @@ struct rte_mbuf {
> > struct rte_mempool *pool; /**< Pool from which mbuf was allocated.
> */
> > struct rte_mbuf *next;/**< Next segment of scattered packet. */
> >
> > -   union {
> > -   uint8_t metadata[0];
> > -   uint16_t metadata16[0];
> > -   uint32_t metadata32[0];
> > -   uint64_t metadata64[0];
> > -   } __rte_cache_aligned;
> >  } __rte_cache_aligned;
> >
> >  #define RTE_MBUF_METADATA_UINT8(mbuf, offset)  \
> > -   (mbuf->metadata[offset])
> > +   (((uint8_t *)&(mbuf)[1])[offset])
> >  #define RTE_MBUF_METADATA_UINT16(mbuf, offset) \
> > -   (mbuf->metadata16[offset/sizeof(uint16_t)])
> > +   (((uint16_t *)&(mbuf)[1])[offset/sizeof(uint16_t)])
> >  #define RTE_MBUF_METADATA_UINT32(mbuf, offset) \
> > -   (mbuf->metadata32[offset/sizeof(uint32_t)])
> > +   (((uint32_t *)&(mbuf)[1])[offset/sizeof(uint32_t)])
> >  #define RTE_MBUF_METADATA_UINT64(mbuf, offset) \
> > -   (mbuf->metadata64[offset/sizeof(uint64_t)])
> > +   (((uint64_t *)&(mbuf)[1])[offset/sizeof(uint64_t)])
> >
> >  #define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)  \
> > -   (>metadata[offset])
> > +   (_MBUF_METADATA_UINT8(mbuf, offset))
> >  #define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset) \
> > -   (>metadata16[offset/sizeof(uint16_t)])
> > +   (_MBUF_METADATA_UINT16(mbuf, offset))
> >  #define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset) \
> > -   (>metadata32[offset/sizeof(uint32_t)])
> > +   (_MBUF_METADATA_UINT32(mbuf, offset))
> >  #define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset) \
> > -   (>metadata64[offset/sizeof(uint64_t)])
> > +   (_MBUF_METADATA_UINT64(mbuf, offset))
> >
> >  /**
> >   * Given the buf_addr returns the pointer to corresponding mbuf.
> >
> 
> I think it goes in the good direction. So:
> Acked-by: Olivier Matz 
> 
> Just one question: why not removing RTE_MBUF_METADATA*() macros?
> I'd just provide one macro that gives a (void*) to the first byte
> after the mbuf structure.
> 
> The format of the metadata is up to the application, that usually
> casts (m + 1) into a private structure, making the macros not very
> useful. I suggest to move these macros outside rte_mbuf.h, in the
> application-specific or library-specific header, what do you think?
> 
> Regards,
> Olivier
> 
Yes, I'll look into that.

/Bruce


[dpdk-dev] [PATCH 04/13] mbuf: expand ol_flags field to 64-bits

2014-09-09 Thread Richardson, Bruce
> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, September 08, 2014 11:26 AM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 04/13] mbuf: expand ol_flags field to 64-bits
> 
> Hi Bruce,
> 
> On 09/03/2014 05:49 PM, Bruce Richardson wrote:
> > The offload flags field (ol_flags) was 16-bits and had no further room
> > for expansion. This patch increases the field size to 64-bits, using up
> > the remaining reserved space in the single-cache-line mbuf.
> >
> > NOTE: none of the values for existing flags have been changed, i.e. no
> > new numbers have been explicitly reserved between existing flag
> > definitions.
> >
> > Signed-off-by: Bruce Richardson 
> 
> The initial series I've proposed [1][2] had on more enhancement: the
> first patch [1] allowed to remove the definition of flag names in
> testpmd. Indeed, this is not really good because they must be kept
> synchronized with the flags in rte_mbuf. What do you think about this
> patch? Should it be integrated in your series? Or later? Or never? ;)

No, it is a good change - I've just keep it out of my series for simplicity as 
I'm largely trying to keep the scope as small as possible. I would love to see 
that go in as a separate patch maybe once the mbuf rework is finished. 

> 
> The second patch [2] changes the value of the flags. This is not needed
> now, but if we do it in the future, we should not forget to change
> app/test-pmd/cmdline.c accordingly. Maybe this could go in your patch
> directly as it does not hurt?

As above for now. Right now I'm just trying to get the structure worked out, 
and deal with any performance regressions that are found (such as what Pablo 
found last Friday :-( ). 

/Bruce

> 
> Olivier
> 
> 
> [1] http://dpdk.org/ml/archives/dev/2014-May/002545.html
> [2] http://dpdk.org/ml/archives/dev/2014-May/002546.html


[dpdk-dev] [PATCH v2 3/6] mbuf: remove rte_ctrlmbuf

2014-09-09 Thread Richardson, Bruce
> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, September 08, 2014 9:22 AM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: Re: [PATCH v2 3/6] mbuf: remove rte_ctrlmbuf
> 
> Hi Bruce,
> 
> On 08/28/2014 05:42 PM, Bruce Richardson wrote:
> > From: Olivier Matz 
> >
> > The initial role of rte_ctrlmbuf is to carry generic messages (data
> > pointer + data length) but it's not used by the DPDK or it applications.
> > Keeping it implies:
> >   - loosing 1 byte in the rte_mbuf structure
> >   - having some dead code rte_mbuf.[ch]
> >
> > This patch removes this feature. Thanks to it, it is now possible to
> > simplify the rte_mbuf structure by merging the rte_pktmbuf structure
> > in it. This is done in next commit.
> >
> > Signed-off-by: Olivier Matz 
> >
> > * Updated patch to HEAD.
> > * Modified patch to retain the old function names for ctrl mbufs as
> >   macros. This helps with app compatibility, and allows the concept
> >   of a control mbuf to be reintroduced via a single-bit flag in
> >   a future change.
> > * Updated the packet framework ip_pipeline example application to
> >   work following this change.
> >
> > Changes in v2:
> > * Fixed whitespace errors introduced by this patch flagged by checkpatch
> >
> > Signed-off-by: Bruce Richardson 
> 
> To be honest, I'm not convinced that keeping the old function names
> is really required, but I suppose you had good reasons to reintroduce
> them. Just for information, is it for compatibility purpose or is there
> a real wish to reintroduce a sort of control mbuf in the future ?
> 
> Acked-by: Olivier Matz 

Compatibility primarily. However, it's a useful enough concept, and can be 
controlled by having a single-bit flag as done in my second patch set.

/Bruce


[dpdk-dev] [PATCH v2 6/6] mbuf: flatten struct vlan_macip into mbuf struct

2014-09-07 Thread Richardson, Bruce
> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Friday, September 05, 2014 5:21 PM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 6/6] mbuf: flatten struct vlan_macip into
> mbuf struct
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Thursday, August 28, 2014 4:43 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2 6/6] mbuf: flatten struct vlan_macip into
> > mbuf struct
> >
> > The vlan_macip structure combined a vlan tag id with l2 and l3 headers
> > lengths for tracking offloads. However, this structure was only used as
> > a unit by the e1000 and ixgbe drivers, not generally.
> >
> > This patch removes the structure from the mbuf header and places the
> > fields into the mbuf structure directly at the required point, without
> > any net effect on the structure layout. This allows us to treat the vlan
> > tags and header length fields as separate for future mbuf changes. The
> > drivers which were written to use the combined structure still do so,
> > using a driver-local definition of it.
> >
> > Changes in V2:
> > * None
> >
> > Signed-off-by: Bruce Richardson 
> 
> After applying this patch, I see a performance degradation (around 5%) using
> testpmd with the default RX path, so this may require a v3 patch.
> 
> Thanks,
> Pablo

Thanks for Pablo for flagging this. Since no fields are moved in the structure, 
merely flattened, I suspect any degradation must come from having the l2_len 
and l3_len bit fields split out. I'll do up a v3 with a union to allow them to 
be assigned simultaneously and see if it helps things. Fast-path is unaffected 
in my tests.

/Bruce


[dpdk-dev] Are there considerations about resource recycling by the process, not by the OS?

2014-09-05 Thread Richardson, Bruce
Can you perhaps send on the log and the traceback from the panic call. From the 
error message we may be able to tell you why your process is failing.

/Bruce

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhangkun (K)
> Sent: Friday, September 05, 2014 4:40 AM
> To: Masaru Oki
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Are there considerations about resource recycling by 
> the
> process, not by the OS?
> 
> Hi,
> Yes, the program is running as root.
> I want to know if or not the DPDK Community considers about resource
> recycling.
> 
> From: Masaru Oki [mailto:m-oki at stratosphere.co.jp]
> Date: 2014?9?5? 11:18
> To: Zhangkun (K)
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Are there considerations about resource recycling by 
> the
> process, not by the OS?
> 
> Hi,
> 
> you may try
>  $ gdb your_program
>  gdb> b rte_panic
>  gdb> run
>  stopped at rte_panic
>  gdb> bt
> 
> sorry, it is not answer for your question.
> by the way, do you run the program as root?
> 
> 
> 2014-09-05 12:07 GMT+09:00 Zhangkun (K)
> mailto:zhang.zhangkun at huawei.com>>:
> Hi,
>I develop the program with the use of dpdk and the process is core dump 
> when
> the program is started.
> The process is crash in case of initing memory failed and calling rte_panic
> function.
>  I search all rte_panic function in the dpdk, and find it so many. Therefore, 
> in
> case the pr ocess happen fail,
> it will abort. The resource allocated before is not released by the process, 
> but by
> the OS system.
> So I have a question: Are there considerations about resource recycling by the
> process, not by the OS ?



[dpdk-dev] reg : adding grub entry with hugepages.

2014-09-01 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhang, Jerry
> Sent: Monday, September 01, 2014 10:37 AM
> To: Anand S Angadi
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] reg : adding grub entry with hugepages.
> 
> >> Hello everyone,
> >> I am using fedora 16, i want to Add additional Grub entry with hugepages
> >enabled permanently can u tell me how can i add?
> >> and wher can i add?
> >>
> >> --
> >> Thanks & Regards,
> >> ANAND
> >>
> >>
> >>
> >>
> >Hi Zhang,
> >Thank you for your reply, now I am using same thing but i want it permanently
> to
> >add. means after every time reboot i have write again it, please help me.
> 
> You may add this parameter in /etc/default/grub
> 
> GRUB_CMDLINE_LINUX="default_hugepagesz=1G hugepagesz=1G
> hugepages=8 hugepagesz=2M hugepages=512"

Again, please note that after changing the grub defaults you still need to 
regenerate the grub configuration file, which is generally in /boot (on fedora 
20 it's in /boot/grub2/grub.cfg). This is the file actually read at boot time 
by grub.

/Bruce


[dpdk-dev] reg : adding grub entry with hugepages.

2014-09-01 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhang, Jerry
> Sent: Monday, September 01, 2014 9:52 AM
> To: Anand S Angadi; dev at dpdk.org
> Subject: Re: [dpdk-dev] reg : adding grub entry with hugepages.
> 
> Hi,
> 
>   Here is an example to enable both 1G and 2M in kernel parameters.
> 
> default_hugepagesz=1G hugepagesz=1G hugepages=8 hugepagesz=2M
> hugepages=512

Just FYI: the default_hugepagesz part is not necessary, so I'd generally leave 
it out and let the system default be 2MB as normal. You can always get a 
hugetlbfs mount of a particular hugepage size by passing the pagesize= 
parameter when doing the mount.

/Bruce


  1   2   3   >