[dpdk-dev] [PATCH] example/ipsec-secgw: ipsec security gateway

2016-03-09 Thread Sergio Gonzalez Monroy
On 01/02/2016 11:26, Jerin Jacob wrote:
> On Mon, Feb 01, 2016 at 11:09:16AM +, Sergio Gonzalez Monroy wrote:
>> On 31/01/2016 14:39, Jerin Jacob wrote:
>>> On Fri, Jan 29, 2016 at 08:29:12PM +, Sergio Gonzalez Monroy wrote:
>>>

>>> IMO, an option for single SA based outbound processing would be useful
>>> measuring performance bottlenecks with SA lookup.
>>>
>> Hi Jerin,
>>
>> Are you suggesting to have an option so we basically encrypt all traffic
>> using
>> a single SA bypassing the SP/ACL ?
> Yes. Basicaly an option to bypass  "rte_acl_classify" if its for single
> SA use case.
>
>

Hi Jerin,

After re-reading your comment regarding the single SA I just want to 
double check
that I understood correctly what you were suggesting.

Basically an option that we can provide a single SA to use for outbound,
skipping rte_acl_classify in outbound path.
That same option would also skip rte_acl_classify in inbound path 
without checking
that we accept specific traffic for an SA.

Is that correct?

Sergio


[dpdk-dev] [PATCH v5 0/2] Increased number of next hops for LPM IPv4.

2016-03-09 Thread Thomas Monjalon
2016-03-09 17:57, Michal Jastrzebski:
> From: Michal Kobylinski 
> 
> This patchset extend next_hop field from 8-bits to 24-bits in LPM library for 
> IPv4.
> 
> As next_hop field is increased now the maximum number of tbl8s is 2^24. 
> A new rte_lpm_config structure is used so LPM library will allocate
> exactly the amount of memory which is necessary to hold application?s rules.
> 
> Added versioning symbols to functions and updated
> library and applications that have a dependency on LPM library.
> 
> Michal Kobylinski (2):
>   lpm: extended ipv4 next_hop field
>   lpm: added a new rte_lpm_config structure for ipv4

Applied with few changes in the release notes, thanks.


[dpdk-dev] [PATCH] eal: add option --avail-cores to detect lcores

2016-03-09 Thread Tan, Jianfeng
Hi Konstantin,

On 3/9/2016 10:44 PM, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: Tan, Jianfeng
>> Sent: Wednesday, March 09, 2016 2:17 PM
>> To: Ananyev, Konstantin; Panu Matilainen; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] eal: add option --avail-cores to detect 
>> lcores
>>
>>
>>
>> On 3/9/2016 10:01 PM, Ananyev, Konstantin wrote:
 -Original Message-
 From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tan, Jianfeng
 Sent: Wednesday, March 09, 2016 1:53 PM
 To: Panu Matilainen; dev at dpdk.org
 Subject: Re: [dpdk-dev] [PATCH] eal: add option --avail-cores to detect 
 lcores



 On 3/9/2016 9:05 PM, Panu Matilainen wrote:
> On 03/08/2016 07:38 PM, Tan, Jianfeng wrote:
>> Hi Panu,
>>
>> On 3/8/2016 4:54 PM, Panu Matilainen wrote:
>>> On 03/04/2016 12:05 PM, Jianfeng Tan wrote:
 This patch adds option, --avail-cores, to use lcores which are
 available
 by calling pthread_getaffinity_np() to narrow down detected cores
 before
 parsing coremask (-c), corelist (-l), and coremap (--lcores).

 Test example:
 $ taskset 0xc ./examples/helloworld/build/helloworld \
   --avail-cores -m 1024

 Signed-off-by: Jianfeng Tan 
 Acked-by: Neil Horman 
>>> Hmm, to me this sounds like something that should be done always so
>>> there's no need for an option. Or if there's a chance it might do the
>>> wrong thing in some rare circumstance then perhaps there should be a
>>> disabler option instead?
>> Thanks for comments.
>>
>> Yes, there's a use case that we cannot handle.
>>
>> If we make it as default, DPDK applications may fail to start, when user
>> specifies a core in isolcpus and its parent process (say bash) has a
>> cpuset affinity that excludes isolcpus. Originally, DPDK applications
>> just blindly do pthread_setaffinity_np() and it always succeeds because
>> it always has root privilege to change any cpu affinity.
>>
>> Now, if we do the checking in rte_eal_cpu_init(), those lcores will be
>> flagged as undetected (in my older implementation) and leads to failure.
>> To make it correct, we would always add "taskset mask" (or other ways)
>> before DPDK application cmd lines.
>>
>> How do you think?
> I still think it sounds like something that should be done by default
> and maybe be overridable with some flag, rather than the other way
> around. Another alternative might be detecting the cores always but if
> running as root, override but with a warning.
 For your second solution, only root can setaffinity to isolcpus?
 Your first solution seems like a promising way for me.

> But I dont know, just wondering. To look at it from another angle: why
> would somebody use this new --avail-cores option and in what
> situation, if things "just work" otherwise anyway?
 For DPDK applications, the most common case to initialize DPDK is like
 this: "$dpdk-app [options for DPDK] -- [options for app]", so users need
 to specify which cores to run and how much hugepages are used. Suppose
 we need this dpdk-app to run in a container, users already give those
 information when they build up the cgroup for it to run inside, this
 option or this patch is to make DPDK more smart to discover how much
 resource will be used. Make sense?
>>> But then, all we need might be just a script that would extract this 
>>> information from the system
>>> and form a proper cmdline parameter for DPDK?
>> Yes, a script will work. Or to construct (argc, argv) to call
>> rte_eal_init() in the application. But as Neil Horman once suggested, a
>> simple pthread_getaffinity_np() will get all things done. So if it worth
>> a patch here?
> Don't know...
> Personally I would prefer not to put extra logic inside EAL.
> For me - there are too many different options already.

Then how about make it default in rte_eal_cpu_init()? And it is already 
known it will bring trouble to those use isolcpus users, they need to 
add "taskset [mask]" before starting a DPDK app.

>  From other side looking at the patch itself:
> You are updating lcore_count and lcore_config[],based on physical cpu 
> availability,
> but these days it is not always one-to-one mapping between EAL lcore and 
> physical cpu.
> Shouldn't that be taken into account?

I have not see the problem so far, because this work is done before 
parsing coremask (-c), corelist (-l), and coremap (--lcores). If a core 
is disabled here, it's like it is not detected in rte_eal_cpu_init(). Or 
could you please give more hints?

Thanks,
Jianfeng

> Konstantin
>   
>
>



[dpdk-dev] [PATCH v2 1/5] mem: add --single-file to create single mem-backed file

2016-03-09 Thread Tan, Jianfeng
Hi,

On 3/8/2016 10:44 AM, Yuanhan Liu wrote:
> On Tue, Mar 08, 2016 at 09:55:10AM +0800, Tan, Jianfeng wrote:
>> Hi Yuanhan,
>>
>> On 3/7/2016 9:13 PM, Yuanhan Liu wrote:
>>> CC'ed EAL hugepage maintainer, which is something you should do when
>>> send a patch.
>> Thanks for doing this.
>>
>>> On Fri, Feb 05, 2016 at 07:20:24PM +0800, Jianfeng Tan wrote:
 Originally, there're two cons in using hugepage: a. needs root
 privilege to touch /proc/self/pagemap, which is a premise to
 alllocate physically contiguous memseg; b. possibly too many
 hugepage file are created, especially used with 2M hugepage.

 For virtual devices, they don't care about physical-contiguity
 of allocated hugepages at all. Option --single-file is to
 provide a way to allocate all hugepages into single mem-backed
 file.

 Known issue:
 a. single-file option relys on kernel to allocate numa-affinitive
 memory.
 b. possible ABI break, originally, --no-huge uses anonymous memory
 instead of file-backed way to create memory.

 Signed-off-by: Huawei Xie 
 Signed-off-by: Jianfeng Tan 
>>> ...
 @@ -956,6 +961,16 @@ eal_check_common_options(struct internal_config 
 *internal_cfg)
"be specified together with --"OPT_NO_HUGE"\n");
return -1;
}
 +  if (internal_cfg->single_file && internal_cfg->force_sockets == 1) {
 +  RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE" cannot "
 +  "be specified together with --"OPT_SOCKET_MEM"\n");
 +  return -1;
 +  }
 +  if (internal_cfg->single_file && internal_cfg->hugepage_unlink) {
 +  RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
 +  "be specified together with --"OPT_SINGLE_FILE"\n");
 +  return -1;
 +  }
>>> The two limitation doesn't make sense to me.
>> For the force_sockets option, my original thought on --single-file option
>> is, we don't sort those pages (require root/cap_sys_admin) and even don't
>> look up numa information because it may contain both sockets' memory.
>>
>> For the hugepage_unlink option, those hugepage files get closed in the end
>> of memory initialization, if we even unlink those hugepage files, so we
>> cannot share those with other processes (say backend).
> Yeah, I know how the two limitations come, from your implementation. I
> was just wondering if they both are __truly__ the limitations. I mean,
> can we get rid of them somehow?
>
> For --socket-mem option, if we can't handle it well, or if we could
> ignore the socket_id for allocated huge page, yes, the limitation is
> a true one.

To make it work with --socket-mem option, we need to call 
mbind()/set_mempolicy(), which leads to including "LDFLAGS += -lnuma" a 
mandatory line in mk file. Don't know if it's  acceptable to bring in 
dependency on libnuma.so?


>
> But for the second option, no, we should be able to co-work it with
> well. One extra action is you should not invoke "close(fd)" for those
> huge page files. And then you can get all the informations as I stated
> in a reply to your 2nd patch.

As discussed yesterday, I think there's a open files limitation for each 
process, if we keep those FDs open, it will bring failure to those 
existing programs. If others treat it as a problem?
...
>>> BTW, since we already have SINGLE_FILE_SEGMENTS (config) option, adding
>>> another option --single-file looks really confusing to me.
>>>
>>> To me, maybe you could base the SINGLE_FILE_SEGMENTS option, and add
>>> another option, say --no-sort (I confess this name sucks, but you get
>>> my point). With that, we could make sure to create as least huge page
>>> files as possible, to fit your case.
>> This is a great advice. So how do you think of --converged, or
>> --no-scattered-mem, or any better idea?
> TBH, none of them looks great to me, either. But I have no better
> options. Well, --no-phys-continuity looks like the best option to
> me so far :)

I'd like to make it a little more concise, how about --no-phys-contig? 
In addition, Yuanhan thinks there's still no literal meaning that just 
create one file for each hugetlbfs (or socket). But from my side, 
there's an indirect meaning, because if no need to promise 
physically-contig, then no need to create hugepages one by one. Anyone 
can give your option here? Thanks.

Thanks,
Jianfeng


[dpdk-dev] [PATCH] eal: add option --avail-cores to detect lcores

2016-03-09 Thread Tan, Jianfeng


On 3/9/2016 10:01 PM, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tan, Jianfeng
>> Sent: Wednesday, March 09, 2016 1:53 PM
>> To: Panu Matilainen; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] eal: add option --avail-cores to detect 
>> lcores
>>
>>
>>
>> On 3/9/2016 9:05 PM, Panu Matilainen wrote:
>>> On 03/08/2016 07:38 PM, Tan, Jianfeng wrote:
 Hi Panu,

 On 3/8/2016 4:54 PM, Panu Matilainen wrote:
> On 03/04/2016 12:05 PM, Jianfeng Tan wrote:
>> This patch adds option, --avail-cores, to use lcores which are
>> available
>> by calling pthread_getaffinity_np() to narrow down detected cores
>> before
>> parsing coremask (-c), corelist (-l), and coremap (--lcores).
>>
>> Test example:
>> $ taskset 0xc ./examples/helloworld/build/helloworld \
>>  --avail-cores -m 1024
>>
>> Signed-off-by: Jianfeng Tan 
>> Acked-by: Neil Horman 
> Hmm, to me this sounds like something that should be done always so
> there's no need for an option. Or if there's a chance it might do the
> wrong thing in some rare circumstance then perhaps there should be a
> disabler option instead?
 Thanks for comments.

 Yes, there's a use case that we cannot handle.

 If we make it as default, DPDK applications may fail to start, when user
 specifies a core in isolcpus and its parent process (say bash) has a
 cpuset affinity that excludes isolcpus. Originally, DPDK applications
 just blindly do pthread_setaffinity_np() and it always succeeds because
 it always has root privilege to change any cpu affinity.

 Now, if we do the checking in rte_eal_cpu_init(), those lcores will be
 flagged as undetected (in my older implementation) and leads to failure.
 To make it correct, we would always add "taskset mask" (or other ways)
 before DPDK application cmd lines.

 How do you think?
>>> I still think it sounds like something that should be done by default
>>> and maybe be overridable with some flag, rather than the other way
>>> around. Another alternative might be detecting the cores always but if
>>> running as root, override but with a warning.
>> For your second solution, only root can setaffinity to isolcpus?
>> Your first solution seems like a promising way for me.
>>
>>> But I dont know, just wondering. To look at it from another angle: why
>>> would somebody use this new --avail-cores option and in what
>>> situation, if things "just work" otherwise anyway?
>> For DPDK applications, the most common case to initialize DPDK is like
>> this: "$dpdk-app [options for DPDK] -- [options for app]", so users need
>> to specify which cores to run and how much hugepages are used. Suppose
>> we need this dpdk-app to run in a container, users already give those
>> information when they build up the cgroup for it to run inside, this
>> option or this patch is to make DPDK more smart to discover how much
>> resource will be used. Make sense?
> But then, all we need might be just a script that would extract this 
> information from the system
> and form a proper cmdline parameter for DPDK?

Yes, a script will work. Or to construct (argc, argv) to call 
rte_eal_init() in the application. But as Neil Horman once suggested, a 
simple pthread_getaffinity_np() will get all things done. So if it worth 
a patch here?

Thanks,
Jianfeng

> Konstantin
>
>> Thanks,
>> Jianfeng
>>
>>
>>>  - Panu -
>>>



[dpdk-dev] [PATCH v9 0/4] ethdev: add speed capabilities and refactor link API

2016-03-09 Thread Marc
On 9 March 2016 at 11:09, N?lio Laranjeiro 
wrote:

> On Wed, Mar 09, 2016 at 10:29:38AM +0100, N?lio Laranjeiro wrote:
> > On Tue, Mar 08, 2016 at 05:53:05PM +0100, N?lio Laranjeiro wrote:
> > > On Tue, Mar 08, 2016 at 04:00:29PM +0100, Marc Sune wrote:
> > > > 2016-03-01 1:45 GMT+01:00 Marc Sune :
> > > >
> > > > > The current rte_eth_dev_info abstraction does not provide any
> mechanism to
> > > > > get the supported speed(s) of an ethdev.
> > > > >
> > > > > For some drivers (e.g. ixgbe), an educated guess could be done
> based on the
> > > > > driver's name (driver_name in rte_eth_dev_info), see:
> > > > >
> > > > > http://dpdk.org/ml/archives/dev/2013-August/000412.html
> > > > >
> > > > > However, i) doing string comparisons is annoying, and can silently
> > > > > break existing applications if PMDs change their names ii) it does
> not
> > > > > provide all the supported capabilities of the ethdev iii) for some
> drivers
> > > > > it
> > > > > is impossible determine correctly the (max) speed by the
> application
> > > > > (e.g. in i40, distinguish between XL710 and X710).
> > > > >
> > > > > In addition, the link APIs do not allow to define a set of
> advertised link
> > > > > speeds for autonegociation.
> > > > >
> > > > > This series of patches adds the following capabilities:
> > > > >
> > > > > * speed_capa bitmap in rte_eth_dev_info, which is filled by the
> PMDs
> > > > >   according to the physical device capabilities.
> > > > > * refactors link API in ethdev to allow the definition of the
> advertised
> > > > >   link speeds, fix speed (no auto-negociation) or advertise all
> supported
> > > > >   speeds (default).
> > > > >
> > > > > WARNING: this patch series, specifically 3/4, is NOT tested for
> most of the
> > > > > PMDs, due to the lack of hardware. Only generic EM is tested (VM).
> > > > > Reviewing
> > > > > and testing required by PMD maintainers.
> > > > >
> > > > > * * * * *
> > > > >
> > > > > v2: rebase, converted speed_capa into 32 bits bitmap, fixed
> alignment
> > > > > (checkpatch).
> > > > >
> > > > > v3: rebase to v2.1. unified ETH_LINK_SPEED and ETH_SPEED_CAP into
> > > > > ETH_SPEED.
> > > > > Converted field speed in struct rte_eth_conf to speed, to
> allow a
> > > > > bitmap
> > > > > for defining the announced speeds, as suggested M. Brorup.
> Fixed
> > > > > spelling
> > > > > issues.
> > > > >
> > > > > v4: fixed errata in the documentation of field speeds of
> rte_eth_conf, and
> > > > > commit 1/2 message. rebased to v2.1.0. v3 was incorrectly
> based on
> > > > > ~2.1.0-rc1.
> > > > >
> > > > > v5: revert to v2 speed capabilities patch. Fixed MLX4 speed
> capabilities
> > > > > (thanks N. Laranjeiro). Refactored link speed API to allow
> setting
> > > > > advertised speeds (3/4). Added NO_AUTONEG option to
> explicitely disable
> > > > > auto-negociation. Updated 2.2 rel. notes (4/4). Rebased to
> current
> > > > > HEAD.
> > > > >
> > > > > v6: Move link_duplex to be part of bitfield. Fixed i40 autoneg
> flag link
> > > > > update code. Added rte_eth_speed_to_bm_flag() to .map file.
> Fixed other
> > > > > spelling issues. Rebased to current HEAD.
> > > > >
> > > > > v7: Rebased to current HEAD. Moved documentation to v2.3. Still
> needs
> > > > > testing
> > > > > from PMD maintainers.
> > > > >
> > > > > v8: Rebased to current HEAD. Modified em driver impl. to not touch
> base
> > > > > files.
> > > > > Merged patch 5 into 3 (map file). Changed numeric speed to a
> 64 bit
> > > > > value.
> > > > > Filled-in speed capabilities for drivers bnx2x, cxgbe, mlx5
> and nfp in
> > > > > addition to the ones of previous patch sets.
> > > > >
> > > > > v9: rebased to current HEAD. Reverted numeric speed to 32 bit in
> struct
> > > > > rte_eth_link (no atomic link get > 64bit). Fixed mlx5 driver
> > > > > compilation
> > > > > and link speeds. Moved documentation to release_16_04.rst and
> fixed
> > > > > several
> > > > > issues. Upgrade NIC notes with speed capabilities.
> > > > >
> > > >
> > > > Anyone interested in reviewing and _testing_ this series?
> > > >
> > > > Thank you
> > > > Marc
> > >
> > > Hi Marc,
> > >
> > > I will take a look tomorrow morning and run test on Mellanox NICs
> > > (ConnectX 3 and 4).
> > >
> > > I do not have access to the others NICs, if those who have can do
> > > it, could be really great.
> > >
> > > Regards,
> >
> > It works as expected with Mellanox NICs.
> >
> > Regards,
> >
> > --
> > N?lio Laranjeiro
> > 6WIND
>
> Tested-by: Nelio Laranjeiro 
>
> - OS/Kernel: Debian 8/3.16.0-4-amd64
> - GCC: gcc (Debian 4.9.2-10) 4.9.2
> - CPU: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
> - MLNX OFED: 3.2-2.0.0.0
> - NIC: ConnectX 4 100G
>
> - OS/Kernel: Debian 7/3.16.0-0.bpo.4-amd64
> - GCC: gcc (Debian 4.7.2-5) 4.7.2
> - CPU: Intel(R) Xeon(R) CPU E5-2648L 0 @ 1.80GHz
> - MLNX OFED: 3.2-2.0.0.0
> - NIC: ConnectX 3 Pro
>
>
> 1. Link displayed at the correct negotiated speed:

[dpdk-dev] [PATCH] eal: add option --avail-cores to detect lcores

2016-03-09 Thread Tan, Jianfeng


On 3/9/2016 9:05 PM, Panu Matilainen wrote:
> On 03/08/2016 07:38 PM, Tan, Jianfeng wrote:
>> Hi Panu,
>>
>> On 3/8/2016 4:54 PM, Panu Matilainen wrote:
>>> On 03/04/2016 12:05 PM, Jianfeng Tan wrote:
 This patch adds option, --avail-cores, to use lcores which are 
 available
 by calling pthread_getaffinity_np() to narrow down detected cores 
 before
 parsing coremask (-c), corelist (-l), and coremap (--lcores).

 Test example:
 $ taskset 0xc ./examples/helloworld/build/helloworld \
 --avail-cores -m 1024

 Signed-off-by: Jianfeng Tan 
 Acked-by: Neil Horman 
>>>
>>> Hmm, to me this sounds like something that should be done always so
>>> there's no need for an option. Or if there's a chance it might do the
>>> wrong thing in some rare circumstance then perhaps there should be a
>>> disabler option instead?
>>
>> Thanks for comments.
>>
>> Yes, there's a use case that we cannot handle.
>>
>> If we make it as default, DPDK applications may fail to start, when user
>> specifies a core in isolcpus and its parent process (say bash) has a
>> cpuset affinity that excludes isolcpus. Originally, DPDK applications
>> just blindly do pthread_setaffinity_np() and it always succeeds because
>> it always has root privilege to change any cpu affinity.
>>
>> Now, if we do the checking in rte_eal_cpu_init(), those lcores will be
>> flagged as undetected (in my older implementation) and leads to failure.
>> To make it correct, we would always add "taskset mask" (or other ways)
>> before DPDK application cmd lines.
>>
>> How do you think?
>
> I still think it sounds like something that should be done by default 
> and maybe be overridable with some flag, rather than the other way 
> around. Another alternative might be detecting the cores always but if 
> running as root, override but with a warning.

For your second solution, only root can setaffinity to isolcpus?
Your first solution seems like a promising way for me.

>
> But I dont know, just wondering. To look at it from another angle: why 
> would somebody use this new --avail-cores option and in what 
> situation, if things "just work" otherwise anyway?

For DPDK applications, the most common case to initialize DPDK is like 
this: "$dpdk-app [options for DPDK] -- [options for app]", so users need 
to specify which cores to run and how much hugepages are used. Suppose 
we need this dpdk-app to run in a container, users already give those 
information when they build up the cgroup for it to run inside, this 
option or this patch is to make DPDK more smart to discover how much 
resource will be used. Make sense?

Thanks,
Jianfeng


>
> - Panu -
>



[dpdk-dev] [RFC 10/35] eal: introduce RTE_DECONST macro

2016-03-09 Thread Olivier MATZ
Hi,

On 03/09/2016 07:53 PM, Stephen Hemminger wrote:
> Can't we just write correct code rather than trying to trick the compiler.

Thank you for your comment. This macro is introduced for next
commit, I would be happy if you could help me to remove it.

My opinion is that using a macro like this is cleaner than doing a
discreet cast that nobody, because it is explicit. The const qualifier
is not only for the compiler, but also for people reading the code.

In this case, the objective is to be able to do the following:

uint32_t rte_mempool_obj_iter(struct rte_mempool *mp,
   rte_mempool_obj_cb_t *obj_cb, void *obj_cb_arg)
{
/* call a function on all objects of a mempool */
}

static void
mempool_obj_audit(struct rte_mempool *mp,
__rte_unused void *opaque, void *obj, __rte_unused unsigned idx)
{
/* do some check on one mempool object */
}


void rte_mempool_audit(const struct rte_mempool *mp)
{
/* iterate objects in mempool using rte_mempool_obj_iter() */
}


In the public API:

- rte_mempool_obj_iter() has the proper prototype: this function
  can be used to make rw access to the mempool
- rte_mempool_audit() has the proper public prototype: this function
  won't modify the mempool

Internally:
- we use a deconst to be able to make use of rte_mempool_obj_iter(),
  but we call a static function that won't modify the mempool.

Note that this kind of macro is also used in projects like FreeBSD:
http://fxr.watson.org/fxr/ident?i=__DECONST

You can also find many examples in Linux kernel where const qualifier
is silently dropped. For instance, you can grep the following in Linux:
 "git grep 'iov_base = (void \*)'"

If you have a better alternative, without duplicating the code,
I'll be happy to learn.


Thanks,
Olivier


[dpdk-dev] [PATCH] virtio: fix rx ring descriptor starvation

2016-03-09 Thread Bruce Richardson
On Fri, Mar 04, 2016 at 08:25:07AM -0500, Kyle Larose wrote:
> On Fri, Mar 4, 2016 at 3:11 AM, Tom Kiely  wrote:
> > Sure.
> >Tom
> >
> >
> > On 03/04/2016 06:16 AM, Xie, Huawei wrote:
> >>
> >> On 2/23/2016 12:23 AM, Tom Kiely wrote:
> >>>
> >>> Hi,
> >>>  Sorry I missed the last few messages until now. I'm happy with
> >>> just removing the "if". Kyle, when you say you fixed it, do you mean
> >>> that you will push the patch or have already done so ?
> >>> Thanks,
> >>> Tom
> >>
> >> Could you please send the patch?
> >>
> >
> 
> I should have replied to this earlier. I submitted a patch last week:
> http://dpdk.org/dev/patchwork/patch/10904/

Thanks, Kyle. Unfortunately the patch you submitted is missing your signoff.
Can you perhaps resubmit it as a V2 with the necessary sign-off as described
in the contributors guide:
http://dpdk.org/doc/guides/contributing/patches.html#commit-messages-body

Huawei or Tom, could one of you guys perhaps review and ack the patch once it's
submitted with a signoff?

Thanks,
/Bruce


[dpdk-dev] [RFC 10/35] eal: introduce RTE_DECONST macro

2016-03-09 Thread Bruce Richardson
On Wed, Mar 09, 2016 at 09:47:35PM +0100, Olivier MATZ wrote:
> Hi,
> 
> On 03/09/2016 07:53 PM, Stephen Hemminger wrote:
> > Can't we just write correct code rather than trying to trick the compiler.
> 
> Thank you for your comment. This macro is introduced for next
> commit, I would be happy if you could help me to remove it.
> 
> My opinion is that using a macro like this is cleaner than doing a
> discreet cast that nobody, because it is explicit. The const qualifier
> is not only for the compiler, but also for people reading the code.
> 
> In this case, the objective is to be able to do the following:
> 
> uint32_t rte_mempool_obj_iter(struct rte_mempool *mp,
>rte_mempool_obj_cb_t *obj_cb, void *obj_cb_arg)
> {
>   /* call a function on all objects of a mempool */
> }
> 
> static void
> mempool_obj_audit(struct rte_mempool *mp,
>   __rte_unused void *opaque, void *obj, __rte_unused unsigned idx)
> {
>   /* do some check on one mempool object */
> }
> 
> 
> void rte_mempool_audit(const struct rte_mempool *mp)
> {
>   /* iterate objects in mempool using rte_mempool_obj_iter() */
> }
> 
> 
> In the public API:
> 
> - rte_mempool_obj_iter() has the proper prototype: this function
>   can be used to make rw access to the mempool
> - rte_mempool_audit() has the proper public prototype: this function
>   won't modify the mempool
> 
> Internally:
> - we use a deconst to be able to make use of rte_mempool_obj_iter(),
>   but we call a static function that won't modify the mempool.
> 
> Note that this kind of macro is also used in projects like FreeBSD:
> http://fxr.watson.org/fxr/ident?i=__DECONST
> 
> You can also find many examples in Linux kernel where const qualifier
> is silently dropped. For instance, you can grep the following in Linux:
>  "git grep 'iov_base = (void \*)'"
> 
> If you have a better alternative, without duplicating the code,
> I'll be happy to learn.
> 
I really don't like this dropping of const either, but I do see the problem.
I'd nearly rather see two copies of the function than start dropping the const
in such a way. Also, I'd see having the function itself be a wrapper around a
macro as a better alternative too, assuming such a construction is possible.

/Bruce


[dpdk-dev] [PATCH] doc: fix API change in release note

2016-03-09 Thread Jingjing Wu
Move the structure ``rte_eth_fdir_masks`` change announcement from ABI
to API in release note.

Fixes: 1409f127d7f1 (ethdev: fix byte order consistency of flow director)
Signed-off-by: Jingjing Wu 
---
 doc/guides/rel_notes/release_16_04.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 96f144e..4c86660 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -137,6 +137,9 @@ This section should contain API changes. Sample format:
 * Add a short 1-2 sentence description of the API change. Use fixed width
   quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past 
tense.

+* The fields in ethdev structure ``rte_eth_fdir_masks`` were changed
+  to be in big endian.
+

 ABI Changes
 ---
@@ -145,9 +148,6 @@ ABI Changes
   the previous releases and made in this release. Use fixed width quotes for
   ``rte_function_names`` or ``rte_struct_names``. Use the past tense.

-* The fields in ethdev structure ``rte_eth_fdir_masks`` were changed
-  to be in big endian.
-

 Shared Library Versions
 ---
-- 
2.4.0



[dpdk-dev] [PATCH] eal: add option --avail-cores to detect lcores

2016-03-09 Thread Ananyev, Konstantin


>  On 3/8/2016 4:54 PM, Panu Matilainen wrote:
> > On 03/04/2016 12:05 PM, Jianfeng Tan wrote:
> >> This patch adds option, --avail-cores, to use lcores which are
> >> available
> >> by calling pthread_getaffinity_np() to narrow down detected cores
> >> before
> >> parsing coremask (-c), corelist (-l), and coremap (--lcores).
> >>
> >> Test example:
> >> $ taskset 0xc ./examples/helloworld/build/helloworld \
> >>--avail-cores -m 1024
> >>
> >> Signed-off-by: Jianfeng Tan 
> >> Acked-by: Neil Horman 
> > Hmm, to me this sounds like something that should be done always so
> > there's no need for an option. Or if there's a chance it might do 
> > the
> > wrong thing in some rare circumstance then perhaps there should be a
> > disabler option instead?
>  Thanks for comments.
> 
>  Yes, there's a use case that we cannot handle.
> 
>  If we make it as default, DPDK applications may fail to start, when 
>  user
>  specifies a core in isolcpus and its parent process (say bash) has a
>  cpuset affinity that excludes isolcpus. Originally, DPDK applications
>  just blindly do pthread_setaffinity_np() and it always succeeds 
>  because
>  it always has root privilege to change any cpu affinity.
> 
>  Now, if we do the checking in rte_eal_cpu_init(), those lcores will 
>  be
>  flagged as undetected (in my older implementation) and leads to 
>  failure.
>  To make it correct, we would always add "taskset mask" (or other 
>  ways)
>  before DPDK application cmd lines.
> 
>  How do you think?
> >>> I still think it sounds like something that should be done by default
> >>> and maybe be overridable with some flag, rather than the other way
> >>> around. Another alternative might be detecting the cores always but if
> >>> running as root, override but with a warning.
> >> For your second solution, only root can setaffinity to isolcpus?
> >> Your first solution seems like a promising way for me.
> >>
> >>> But I dont know, just wondering. To look at it from another angle: why
> >>> would somebody use this new --avail-cores option and in what
> >>> situation, if things "just work" otherwise anyway?
> >> For DPDK applications, the most common case to initialize DPDK is like
> >> this: "$dpdk-app [options for DPDK] -- [options for app]", so users 
> >> need
> >> to specify which cores to run and how much hugepages are used. Suppose
> >> we need this dpdk-app to run in a container, users already give those
> >> information when they build up the cgroup for it to run inside, this
> >> option or this patch is to make DPDK more smart to discover how much
> >> resource will be used. Make sense?
> > But then, all we need might be just a script that would extract this 
> > information from the system
> > and form a proper cmdline parameter for DPDK?
>  Yes, a script will work. Or to construct (argc, argv) to call
>  rte_eal_init() in the application. But as Neil Horman once suggested, a
>  simple pthread_getaffinity_np() will get all things done. So if it worth
>  a patch here?
> >>> Don't know...
> >>> Personally I would prefer not to put extra logic inside EAL.
> >>> For me - there are too many different options already.
> >> Then how about make it default in rte_eal_cpu_init()? And it is already
> >> known it will bring trouble to those use isolcpus users, they need to
> >> add "taskset [mask]" before starting a DPDK app.
> > As I said - provide a script?
> 
> Yes. But what I want to say is this script is hard to be right, if there
> are different kinds of limitations. (Barely happen though :-) )

My thought was to keep dpdk code untouched - i.e. let it still blindly 
set_pthread_affinity()
based on the input parameters, and in addition provide a script for those who 
want to run
in '--avail-cores' mode. 
So it could do 'taskset -p $$' and then either form -c parameter list  for the 
app,
or check existing -c/-l/--lcores parameter and complain if not allowed pcpu 
detected.
But ok, might be it is easier and more convenient to have this logic inside EAL,
then in a separate script.

> 
> > Same might be for amount of hugepage memory available to the user?
> 
> Ditto. Limitations like hugetlbfs quota, cgroup hugetlb, some are used
> by app themself (more like an artificial argument) ...
> >
> >>>   From other side looking at the patch itself:
> >>> You are updating lcore_count and lcore_config[],based on physical cpu 
> >>> availability,
> >>> but these days it is not always one-to-one mapping between EAL lcore and 
> >>> physical cpu.
> >>> Shouldn't that be taken into account?
> >> I have not see the 

[dpdk-dev] [PATCH] mempool: avoid memory waste with large pagesize

2016-03-09 Thread Olivier MATZ
On 03/09/2016 03:29 AM, Stephen Hemminger wrote:
> If page size is large (like 64K on ARM) and object size is small
> then don't waste lots of memory by rounding up to page size.
> Instead, round up so that 1 or more objects all fit in a page.
> 
> This preserves the requirement that an object must not a page
> or virt2phys would break, and makes sure 62K is not wasted per mbuf.

You should specify that it only affects runs with "--no-huge".


> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -300,18 +300,24 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
> flags,
>   if (! rte_eal_has_hugepages()) {
>   /*
>* compute trailer size so that pool elements fit exactly in
> -  * a standard page
> +  * a standard page. If elements are smaller than a page
> +  * then allow multiple elements per page
>*/
> - int page_size = getpagesize();
> - int new_size = page_size - sz->header_size - sz->elt_size;
> - if (new_size < 0 || (unsigned int)new_size < sz->trailer_size) {
> + unsigned page_size = getpagesize();
> + uint32_t orig_size, new_size;
> +
> + orig_size = sz->header_size + sz->elt_size;
> + new_size = rte_align32pow2(orig_size);
> + if (new_size > page_size) {
>   printf("When hugepages are disabled, pool objects "
>  "can't exceed PAGE_SIZE: %d + %d + %d > %d\n",
>  sz->header_size, sz->elt_size, sz->trailer_size,
>  page_size);
>   return 0;
>   }
> - sz->trailer_size = new_size;
> +
> + sz->trailer_size = (new_size - orig_size)
> + / (page_size / new_size);
>   }

Looks it does not work, did I miss something?

Examples:

# start with --no-huge
mp = rte_mempool_create("test", 128,
35, 0, 0, NULL, NULL, NULL, NULL, SOCKET_ID_ANY, 0);
rte_mempool_dump(stdout, mp);

shows:
  header_size=64
  elt_size=40
  trailer_size=0
  total_obj_size=104   < should be 128?


# start with --no-huge
mp = rte_mempool_create("test", 128,
191, 0, 0, NULL, NULL, NULL, NULL, SOCKET_ID_ANY,
MEMPOOL_F_NO_CACHE_ALIGN);
rte_mempool_dump(stdout, mp);

shows:
  header_size=8
  elt_size=192
  trailer_size=3
  total_obj_size=203   < should be 256?


The RFC I've just submitted also aims to fix this issue
(but differently).

Regards,
Olivier


[dpdk-dev] [PATCH v3] doc: add Vector FM10K introductions

2016-03-09 Thread Thomas Monjalon
> > From: "Chen Jing D(Mark)" 
> 
> Acked-by: John McNamara 

Applied, thanks

Next step: fill the matrix in overview.rst :)


[dpdk-dev] [PATCH] pcap: fix captured frame length

2016-03-09 Thread Bruce Richardson
On Thu, Jan 28, 2016 at 06:14:45PM +, Nicolas Pernas Maradei wrote:
> Hi Dror,
> 
> Good catch. What you are saying makes sense and it is also explained in
> pcap's documentation. Was your setup unusual though?
> This might sound like a silly question but I don't remember seeing that
> issue and I should have since your fix is correct.
> 
> Nico.
> 
Applied to dpdk-next-net/rel_16_04

/Bruce



[dpdk-dev] [PATCH v3] docs: add statistics read frequency to fm10k guide

2016-03-09 Thread Thomas Monjalon
2016-03-08 17:16, Harry van Haaren:
> This patch documents that the statistics of fm10k based NICs must be
> read regularly in order to avoid an undetected 32 bit integer-overflow.
> 
> Signed-off-by: Harry van Haaren 
> Acked-by: John McNamara 

Applied, thanks


[dpdk-dev] [PATCH v3] doc/nic: add ixgbe statistics on read frequency

2016-03-09 Thread Thomas Monjalon
2016-03-08 14:29, Harry van Haaren:
> This patch adds a note to the ixgbe PMD guide, stating
> the minimum time that statistics must be polled from
> the hardware in order to avoid register values becoming
> saturated and "sticking" to the max value.
> 
> Reported-by: Jerry Zhang 
> Tested-by: Marcin Kerlin 
> Signed-off-by: Harry van Haaren 
> Acked-by: Marcin Kerlin 

Applied, thanks


[dpdk-dev] [PATCH] doc: add szedata2 features into networking driver matrix

2016-03-09 Thread Thomas Monjalon
> Signed-off-by: Matej Vido 
> ---
>  doc/guides/nics/overview.rst | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)

Applied, thanks


[dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api

2016-03-09 Thread Kulasek, TomaszX


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, March 9, 2016 18:07
> To: Kulasek, TomaszX 
> Cc: dev at dpdk.org; Ananyev, Konstantin 
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
> 
> 2016-03-09 16:35, Kulasek, TomaszX:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > +void
> > > > +rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts,
> > > > +uint16_t
> > > unsent,
> > > > +   void *userdata);
> > >
> > > What about rte_eth_tx_buffer_default_callback as name?
> >
> > This function is used now as default to count silently dropped packets
> and update error counter in tx_buffer structure. When I remove error
> counter and set silent drop as default behavior, it's better to have two
> callbacks to choice:
> >
> > 1) silently dropping packets (set as default)
> > 2) as defined above to dropping with counter.
> >
> > Maybe better is to define two default callbacks while many
> > applications can still update it's internal error counter, So IHMO these
> names are more descriptive:
> >
> > rte_eth_tx_buffer_drop_callback
> > rte_eth_tx_buffer_count_callback
> >
> > What you think?
> 
> I think you are right about the name.
> 
> Are you sure providing a "count" callback is needed?
> Is it just to refactor the examples?

I think it's useful to have a callback which let you easily to track the 
overall number of packets dropped. It's handy when you want to drop packets and 
not leave them untracked.

It's good to have it, but it's not critical.

Changing the examples is not a problem while I've got copy-paste superpower.

Tomasz


[dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api

2016-03-09 Thread Thomas Monjalon
2016-03-09 16:35, Kulasek, TomaszX:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > +void
> > > +rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t
> > unsent,
> > > + void *userdata);
> > 
> > What about rte_eth_tx_buffer_default_callback as name?
> 
> This function is used now as default to count silently dropped packets and 
> update error counter in tx_buffer structure. When I remove error counter and 
> set silent drop as default behavior, it's better to have two callbacks to 
> choice:
> 
> 1) silently dropping packets (set as default)
> 2) as defined above to dropping with counter.
> 
> Maybe better is to define two default callbacks while many applications can 
> still update it's internal error counter,
> So IHMO these names are more descriptive:
> 
> rte_eth_tx_buffer_drop_callback
> rte_eth_tx_buffer_count_callback
> 
> What you think?

I think you are right about the name.

Are you sure providing a "count" callback is needed?
Is it just to refactor the examples?


[dpdk-dev] [PATCH v3 0/2] doc: add i40e pmd driver introduction

2016-03-09 Thread Thomas Monjalon
2016-03-09 15:28, Jingjing Wu:
> A new doc "i40e.rst" is added to introduce i40e pmd driver.
> 
> v3 changes:
>  - update table in overview.rst.
>  - rework index.rst to keep an alphabetical order.
> 
> v2 changes:
>  - restrict long code line
>  - fix typo
> 
> Jingjing Wu (2):
>   doc: add doc for i40e pmd driver introduction
>   doc: add i40e to overview table

Applied as one patch, thanks


[dpdk-dev] [PATCH v5 2/2] lpm: added a new rte_lpm_config structure for ipv4

2016-03-09 Thread Michal Jastrzebski
From: Michal Kobylinski 

This patch has depend on: lpm: extended ipv4 next_hop field (v4).

A new rte_lpm_config structure is used so LPM library will allocate
exactly the amount of memory which is necessary to hold application?s
rules.

Signed-off-by: Michal Kobylinski 
Acked-by: David Hunt 
---
 app/test/test_func_reentrancy.c |   9 +-
 app/test/test_lpm.c | 145 
 app/test/test_mp_secondary.c|   7 +-
 app/test/test_table_combined.c  |   2 +
 app/test/test_table_tables.c|   2 +
 doc/guides/rel_notes/release_16_04.rst  |   5 +
 examples/ip_fragmentation/main.c|   7 +-
 examples/ip_reassembly/main.c   |   7 +-
 examples/l3fwd-power/main.c |  10 +-
 examples/l3fwd-vf/main.c|  10 +-
 examples/l3fwd/l3fwd_lpm.c  |   9 +-
 examples/load_balancer/init.c   |   8 +-
 examples/performance-thread/l3fwd-thread/main.c |   8 +-
 lib/librte_lpm/rte_lpm.c|  51 ++---
 lib/librte_lpm/rte_lpm.h|  29 +++--
 lib/librte_table/rte_table_lpm.c|  12 +-
 lib/librte_table/rte_table_lpm.h|   6 +
 17 files changed, 264 insertions(+), 63 deletions(-)

diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
index dbecc52..5d09296 100644
--- a/app/test/test_func_reentrancy.c
+++ b/app/test/test_func_reentrancy.c
@@ -359,6 +359,11 @@ lpm_create_free(__attribute__((unused)) void *arg)
 {
unsigned lcore_self = rte_lcore_id();
struct rte_lpm *lpm;
+   struct rte_lpm_config config;
+
+   config.max_rules = 4;
+   config.number_tbl8s = 256;
+   config.flags = 0;
char lpm_name[MAX_STRING_SIZE];
int i;

@@ -366,7 +371,7 @@ lpm_create_free(__attribute__((unused)) void *arg)

/* create the same lpm simultaneously on all threads */
for (i = 0; i < MAX_ITER_TIMES; i++) {
-   lpm = rte_lpm_create("fr_test_once",  SOCKET_ID_ANY, 4, 0);
+   lpm = rte_lpm_create("fr_test_once",  SOCKET_ID_ANY, );
if ((NULL == lpm) && (rte_lpm_find_existing("fr_test_once") == 
NULL))
return -1;
}
@@ -374,7 +379,7 @@ lpm_create_free(__attribute__((unused)) void *arg)
/* create mutiple fbk tables simultaneously */
for (i = 0; i < MAX_LPM_ITER_TIMES; i++) {
snprintf(lpm_name, sizeof(lpm_name), "fr_test_%d_%d", 
lcore_self, i);
-   lpm = rte_lpm_create(lpm_name, SOCKET_ID_ANY, 4, 0);
+   lpm = rte_lpm_create(lpm_name, SOCKET_ID_ANY, );
if (NULL == lpm)
return -1;

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index f367553..aaf95ec 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -105,6 +105,7 @@ rte_lpm_test tests[] = {
 #define NUM_LPM_TESTS (sizeof(tests)/sizeof(tests[0]))
 #define MAX_DEPTH 32
 #define MAX_RULES 256
+#define NUMBER_TBL8S 256
 #define PASS 0

 /*
@@ -115,18 +116,25 @@ int32_t
 test0(void)
 {
struct rte_lpm *lpm = NULL;
+   struct rte_lpm_config config;
+
+   config.max_rules = MAX_RULES;
+   config.number_tbl8s = NUMBER_TBL8S;
+   config.flags = 0;

/* rte_lpm_create: lpm name == NULL */
-   lpm = rte_lpm_create(NULL, SOCKET_ID_ANY, MAX_RULES, 0);
+   lpm = rte_lpm_create(NULL, SOCKET_ID_ANY, );
TEST_LPM_ASSERT(lpm == NULL);

/* rte_lpm_create: max_rules = 0 */
/* Note: __func__ inserts the function name, in this case "test0". */
-   lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, 0, 0);
+   config.max_rules = 0;
+   lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, );
TEST_LPM_ASSERT(lpm == NULL);

/* socket_id < -1 is invalid */
-   lpm = rte_lpm_create(__func__, -2, MAX_RULES, 0);
+   config.max_rules = MAX_RULES;
+   lpm = rte_lpm_create(__func__, -2, );
TEST_LPM_ASSERT(lpm == NULL);

return PASS;
@@ -140,11 +148,16 @@ int32_t
 test1(void)
 {
struct rte_lpm *lpm = NULL;
+   struct rte_lpm_config config;
+
+   config.number_tbl8s = NUMBER_TBL8S;
+   config.flags = 0;
int32_t i;

/* rte_lpm_free: Free NULL */
for (i = 0; i < 100; i++) {
-   lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, MAX_RULES - i, 0);
+   config.max_rules = MAX_RULES - i;
+   lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, );
TEST_LPM_ASSERT(lpm != NULL);

rte_lpm_free(lpm);
@@ -164,8 +177,13 @@ int32_t
 test2(void)
 {
struct rte_lpm *lpm = NULL;
+   struct rte_lpm_config config;
+
+   config.max_rules = MAX_RULES;
+   config.number_tbl8s = NUMBER_TBL8S;
+   config.flags = 0;

-   lpm = rte_lpm_create(__func__, 

[dpdk-dev] [PATCH v5 1/2] lpm: extended ipv4 next_hop field

2016-03-09 Thread Michal Jastrzebski
From: Michal Kobylinski 

This patch extend next_hop field from 8-bits to 24-bits in LPM library
for IPv4.

Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.

Signed-off-by: Michal Kobylinski 
Acked-by: David Hunt 
---
 app/test/test_lpm.c |  122 +--
 doc/guides/rel_notes/release_16_04.rst  |3 +
 examples/ip_fragmentation/main.c|   16 +-
 examples/ip_reassembly/main.c   |   15 +-
 examples/l3fwd-power/main.c |2 +-
 examples/l3fwd-vf/main.c|2 +-
 examples/l3fwd/l3fwd_em_sse.h   |2 +-
 examples/l3fwd/l3fwd_lpm.h  |6 +-
 examples/l3fwd/l3fwd_lpm_sse.h  |   24 +-
 examples/l3fwd/l3fwd_sse.h  |8 +-
 examples/load_balancer/runtime.c|2 +-
 examples/performance-thread/l3fwd-thread/main.c |   33 +-
 lib/librte_lpm/rte_lpm.c| 1090 ---
 lib/librte_lpm/rte_lpm.h|  200 +++--
 lib/librte_lpm/rte_lpm_version.map  |   11 +
 lib/librte_table/rte_table_lpm.c|   15 +-
 16 files changed, 1274 insertions(+), 277 deletions(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index 8b4ded9..f367553 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -57,7 +57,7 @@
} \
 } while(0)

-typedef int32_t (* rte_lpm_test)(void);
+typedef int32_t (*rte_lpm_test)(void);

 static int32_t test0(void);
 static int32_t test1(void);
@@ -180,8 +180,8 @@ int32_t
 test3(void)
 {
struct rte_lpm *lpm = NULL;
-   uint32_t ip = IPv4(0, 0, 0, 0);
-   uint8_t depth = 24, next_hop = 100;
+   uint32_t ip = IPv4(0, 0, 0, 0), next_hop = 100;
+   uint8_t depth = 24;
int32_t status = 0;

/* rte_lpm_add: lpm == NULL */
@@ -247,8 +247,7 @@ test5(void)
 {
 #if defined(RTE_LIBRTE_LPM_DEBUG)
struct rte_lpm *lpm = NULL;
-   uint32_t ip = IPv4(0, 0, 0, 0);
-   uint8_t next_hop_return = 0;
+   uint32_t ip = IPv4(0, 0, 0, 0), next_hop_return = 0;
int32_t status = 0;

/* rte_lpm_lookup: lpm == NULL */
@@ -277,8 +276,8 @@ int32_t
 test6(void)
 {
struct rte_lpm *lpm = NULL;
-   uint32_t ip = IPv4(0, 0, 0, 0);
-   uint8_t depth = 24, next_hop_add = 100, next_hop_return = 0;
+   uint32_t ip = IPv4(0, 0, 0, 0), next_hop_add = 100, next_hop_return = 0;
+   uint8_t depth = 24;
int32_t status = 0;

lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, MAX_RULES, 0);
@@ -309,10 +308,10 @@ int32_t
 test7(void)
 {
__m128i ipx4;
-   uint16_t hop[4];
+   uint32_t hop[4];
struct rte_lpm *lpm = NULL;
-   uint32_t ip = IPv4(0, 0, 0, 0);
-   uint8_t depth = 32, next_hop_add = 100, next_hop_return = 0;
+   uint32_t ip = IPv4(0, 0, 0, 0), next_hop_add = 100, next_hop_return = 0;
+   uint8_t depth = 32;
int32_t status = 0;

lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, MAX_RULES, 0);
@@ -325,10 +324,10 @@ test7(void)
TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add));

ipx4 = _mm_set_epi32(ip, ip + 0x100, ip - 0x100, ip);
-   rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX);
+   rte_lpm_lookupx4(lpm, ipx4, hop, UINT32_MAX);
TEST_LPM_ASSERT(hop[0] == next_hop_add);
-   TEST_LPM_ASSERT(hop[1] == UINT16_MAX);
-   TEST_LPM_ASSERT(hop[2] == UINT16_MAX);
+   TEST_LPM_ASSERT(hop[1] == UINT32_MAX);
+   TEST_LPM_ASSERT(hop[2] == UINT32_MAX);
TEST_LPM_ASSERT(hop[3] == next_hop_add);

status = rte_lpm_delete(lpm, ip, depth);
@@ -355,10 +354,11 @@ int32_t
 test8(void)
 {
__m128i ipx4;
-   uint16_t hop[4];
+   uint32_t hop[4];
struct rte_lpm *lpm = NULL;
uint32_t ip1 = IPv4(127, 255, 255, 255), ip2 = IPv4(128, 0, 0, 0);
-   uint8_t depth, next_hop_add, next_hop_return;
+   uint32_t next_hop_add, next_hop_return;
+   uint8_t depth;
int32_t status = 0;

lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, MAX_RULES, 0);
@@ -381,10 +381,10 @@ test8(void)
(next_hop_return == next_hop_add));

ipx4 = _mm_set_epi32(ip2, ip1, ip2, ip1);
-   rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX);
-   TEST_LPM_ASSERT(hop[0] == UINT16_MAX);
+   rte_lpm_lookupx4(lpm, ipx4, hop, UINT32_MAX);
+   TEST_LPM_ASSERT(hop[0] == UINT32_MAX);
TEST_LPM_ASSERT(hop[1] == next_hop_add);
-   TEST_LPM_ASSERT(hop[2] == UINT16_MAX);
+   TEST_LPM_ASSERT(hop[2] == UINT32_MAX);
TEST_LPM_ASSERT(hop[3] == next_hop_add);
}

@@ -400,8 +400,7 @@ test8(void)
if (depth != 1) {
   

[dpdk-dev] [PATCH v5 0/2] Increased number of next hops for LPM IPv4.

2016-03-09 Thread Michal Jastrzebski
From: Michal Kobylinski 

This patchset extend next_hop field from 8-bits to 24-bits in LPM library for 
IPv4.

As next_hop field is increased now the maximum number of tbl8s is 2^24. 
A new rte_lpm_config structure is used so LPM library will allocate
exactly the amount of memory which is necessary to hold application?s rules.

Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.

Michal Kobylinski (2):
  lpm: extended ipv4 next_hop field
  lpm: added a new rte_lpm_config structure for ipv4

 app/test/test_func_reentrancy.c |9 +-
 app/test/test_lpm.c |  267 --
 app/test/test_mp_secondary.c|7 +-
 app/test/test_table_combined.c  |2 +
 app/test/test_table_tables.c|2 +
 doc/guides/rel_notes/release_16_04.rst  |8 +
 examples/ip_fragmentation/main.c|   23 +-
 examples/ip_reassembly/main.c   |   22 +-
 examples/l3fwd-power/main.c |   12 +-
 examples/l3fwd-vf/main.c|   12 +-
 examples/l3fwd/l3fwd_em_sse.h   |2 +-
 examples/l3fwd/l3fwd_lpm.c  |9 +-
 examples/l3fwd/l3fwd_lpm.h  |6 +-
 examples/l3fwd/l3fwd_lpm_sse.h  |   24 +-
 examples/l3fwd/l3fwd_sse.h  |8 +-
 examples/load_balancer/init.c   |8 +-
 examples/load_balancer/runtime.c|2 +-
 examples/performance-thread/l3fwd-thread/main.c |   41 +-
 lib/librte_lpm/rte_lpm.c| 1107 ---
 lib/librte_lpm/rte_lpm.h|  227 +++--
 lib/librte_lpm/rte_lpm_version.map  |   11 +
 lib/librte_table/rte_table_lpm.c|   27 +-
 lib/librte_table/rte_table_lpm.h|6 +
 23 files changed, 1520 insertions(+), 322 deletions(-)

-- 
1.9.1



[dpdk-dev] [RFC 00/35] mempool: rework memory allocation

2016-03-09 Thread Olivier MATZ

On 03/09/2016 05:19 PM, Olivier Matz wrote:
> This series is a rework of mempool.
> 
> [...]

I forgot to mention that this series applies on top of Keith's
patch, which is also planned for 16.07:
http://www.dpdk.org/dev/patchwork/patch/10492/


Olivier


[dpdk-dev] [PATCH v4 12/12] docs: add release note for qtest virtio container support

2016-03-09 Thread Tetsuya Mukawa
Signed-off-by: Tetsuya Mukawa 
---
 doc/guides/rel_notes/release_16_04.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index e3142f2..1c8c6b2 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -61,6 +61,9 @@ This section should contain new features added in this 
release. Sample format:

   Add a new virtual device, named eth_cvio, to support virtio for containers.

+* **Virtio support for containers using QEMU qtest mode.**
+  Add a new virtual device, named eth_qtest_virtio, to support virtio for 
containers
+  using QEMU qtest mode.

 Resolved Issues
 ---
-- 
2.1.4



[dpdk-dev] [PATCH v4 11/12] virtio: Add QTest support for virtio-net PMD

2016-03-09 Thread Tetsuya Mukawa
The patch adds a new virtio-net PMD configuration that allows the PMD to
work on host as if the PMD is in VM.
Here is new configuration for virtio-net PMD.
 - CONFIG_RTE_VIRTIO_VDEV_QTEST
To use this mode, EAL needs map all hugepages as one file. Also the file
should be mapped between (1 << 31) and (1 << 44). And start address
should be aligned by EAL memory size.

To allocate like above, use below options.
 --single-file
 --range-virtaddr=0x8000-0x1000
 --align-memsize
If a free region cannot be found, EAL will return error.

To prepare virtio-net device on host, the users need to invoke QEMU
process in special QTest mode. This mode is mainly used for testing QEMU
devices from outer process. In this mode, no guest runs.
Here is QEMU command line.

 $ qemu-system-x86_64 \
 -machine pc-i440fx-1.4,accel=qtest \
 -display none -qtest-log /dev/null \
 -qtest unix:/tmp/socket,server \
 -netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1 \
 -device
virtio-net-pci,netdev=net0,mq=on,disable-modern=false,addr=3 \
 -chardev socket,id=chr1,path=/tmp/ivshmem,server \
 -device ivshmem,size=1G,chardev=chr1,vectors=1,addr=4

 * Should use QEMU-2.5.1, or above.
 * QEMU process is needed per port.
 * virtio-1.0 device are only supported.
 * The vhost backends like vhost-net and vhost-user can be specified.
 * In most cases, just using above command is enough, but you can also
   specify other QEMU virtio-net options like mac address.
 * Only checked "pc-i440fx-1.4" machine, but may work with other
   machines.
 * Should not add "--enable-kvm" to QEMU command line.

After invoking QEMU, the PMD can connect to QEMU process using unix
domain sockets. Over these sockets, virtio-net, ivshmem and piix3
device in QEMU are probed by the PMD.
Here is example of command line.

 $ testpmd -c f -n 1 -m 1024 --no-pci --single-file \
  --range-virtaddr=0x8000-0x1000 --align-memsize \
  --vdev="eth_qtest_virtio0,qtest=/tmp/socket,ivshmem=/tmp/ivshmem"\
  -- --disable-hw-vlan --txqflags=0xf00 -i

Please specify same unix domain sockets and memory size in both QEMU
and DPDK command lines like above.
The share memory size should be power of 2, because ivshmem only
accepts such memory size.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/qtest.h |  55 +
 drivers/net/virtio/virtio_ethdev.c | 457 -
 2 files changed, 501 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio/qtest.h b/drivers/net/virtio/qtest.h
index 46b9ee6..421e62c 100644
--- a/drivers/net/virtio/qtest.h
+++ b/drivers/net/virtio/qtest.h
@@ -35,5 +35,60 @@
 #define _VIRTIO_QTEST_H_

 #define QTEST_DRV_NAME "eth_qtest_virtio"
+#define QTEST_DEVICE_NUM3
+
+#include 
+
+/* Device information */
+#define VIRTIO_NET_DEVICE_ID0x1000
+#define VIRTIO_NET_VENDOR_ID0x1af4
+#define VIRTIO_NET_IRQ_NUM  10
+#define IVSHMEM_DEVICE_ID   0x1110
+#define IVSHMEM_VENDOR_ID   0x1af4
+#define PIIX3_DEVICE_ID 0x7000
+#define PIIX3_VENDOR_ID 0x8086
+
+/* 
+ * IO port mapping of qtest guest
+ * 
+ * 0x - 0xbfff : not used
+ * 0xc000 - 0xc03f : virtio-net(BAR0)
+ * 0xc040 - 0x : not used
+ *
+ * 
+ * Memory mapping of qtest quest
+ * 
+ * 0x_ - 0x_3fff : not used
+ * 0x_4000 - 0x_4fff : virtio-net(BAR1)
+ * 0x_40001000 - 0x_40ff : not used
+ * 0x_4100 - 0x_417f : virtio-net(BAR4)
+ * 0x_4180 - 0x_41ff : not used
+ * 0x_4200 - 0x_42ff : ivshmem(BAR0)
+ * 0x_42000100 - 0x_42ff : not used
+ * 0x_8000 - 0x_ : ivshmem(BAR2)
+ *
+ * We can only specify start address of a region. The region size
+ * will be defined by the device implementation in QEMU.
+ * The size will be pow of 2 according to the PCI specification.
+ * Also, the region start address should be aligned by region size.
+ *
+ * BAR2 of ivshmem will be used to mmap DPDK application memory.
+ * So this address will be dynamically changed, but not to overlap
+ * others, it should be mmaped between above addresses. Such allocation
+ * is done by EAL. Check rte_eal_get_free_region() also.
+ */
+#define VIRTIO_NET_IO_START 0xc000
+#define VIRTIO_NET_MEMORY1_START   0x4000
+#define VIRTIO_NET_MEMORY2_START   0x4100
+#define IVSHMEM_MEMORY_START0x4200
+
+static inline struct rte_pci_id
+qtest_get_pci_id_of_virtio_net(void)
+{
+   struct rte_pci_id id =  {VIRTIO_NET_DEVICE_ID,
+   VIRTIO_NET_VENDOR_ID, 

[dpdk-dev] [PATCH v4 10/12] virtio: Add QTest support to vtpci abstraction

2016-03-09 Thread Tetsuya Mukawa
The patch adds QTest support to vtpci abstraction.
With this patch, only modern virtio device will be supported.
This QTest support will be used by later QTest extension patch of
virtio-net PMD.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/qtest.h |  39 
 drivers/net/virtio/virtio_ethdev.c |   2 +-
 drivers/net/virtio/virtio_pci.c| 368 ++---
 drivers/net/virtio/virtio_pci.h|   9 +-
 4 files changed, 387 insertions(+), 31 deletions(-)
 create mode 100644 drivers/net/virtio/qtest.h

diff --git a/drivers/net/virtio/qtest.h b/drivers/net/virtio/qtest.h
new file mode 100644
index 000..46b9ee6
--- /dev/null
+++ b/drivers/net/virtio/qtest.h
@@ -0,0 +1,39 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_QTEST_H_
+#define _VIRTIO_QTEST_H_
+
+#define QTEST_DRV_NAME "eth_qtest_virtio"
+
+#endif /* _VIRTIO_QTEST_H_ */
diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index bc631c7..747596d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1055,7 +1055,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
pci_dev = eth_dev->pci_dev;

if (virtio_dev_check(eth_dev, RTE_ETH_DEV_PCI, NULL, 0)) {
-   if (vtpci_init(pci_dev, hw) < 0)
+   if (vtpci_init(eth_dev, hw) < 0)
return -1;
}

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 85fbe88..e88531e 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -37,10 +37,16 @@
  #include 
 #endif

+#include "virtio_ethdev.h"
 #include "virtio_pci.h"
 #include "virtio_logs.h"
 #include "virtqueue.h"

+#ifdef RTE_VIRTIO_VDEV_QTEST
+#include "qtest.h"
+#include "qtest_utils.h"
+#endif
+
 /*
  * Following macros are derived from linux/pci_regs.h, however,
  * we can't simply include that header here, as there is no such
@@ -440,6 +446,220 @@ static const struct virtio_pci_ops modern_ops = {
 };


+#ifdef RTE_VIRTIO_VDEV_QTEST
+static inline uint8_t
+qtest_read8(struct virtio_hw *hw, uint8_t *addr)
+{
+   return qtest_read(hw->qsession, (uint64_t)addr, 'b');
+}
+
+static inline void
+qtest_write8(struct virtio_hw *hw, uint8_t val, uint8_t *addr)
+{
+   return qtest_write(hw->qsession, (uint64_t)addr, val, 'b');
+}
+
+static inline uint16_t
+qtest_read16(struct virtio_hw *hw, uint16_t *addr)
+{
+   return qtest_read(hw->qsession, (uint64_t)addr, 'w');
+}
+
+static inline void
+qtest_write16(struct virtio_hw *hw, uint16_t val, uint16_t *addr)
+{
+   return qtest_write(hw->qsession, (uint64_t)addr, val, 'w');
+}
+
+static inline uint32_t
+qtest_read32(struct virtio_hw *hw, uint32_t *addr)
+{
+   return qtest_read(hw->qsession, (uint64_t)addr, 'l');
+}
+
+static inline void
+qtest_write32(struct virtio_hw *hw, uint32_t val, uint32_t *addr)
+{
+   return qtest_write(hw->qsession, (uint64_t)addr, val, 'l');
+}
+
+static inline void
+qtest_write64_twopart(struct virtio_hw *hw,
+   uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+   qtest_write32(hw, val & ((1ULL << 32) - 1), lo);
+   qtest_write32(hw, val >> 32, hi);
+}
+
+static void
+qtest_modern_read_dev_config(struct virtio_hw *hw, 

[dpdk-dev] [PATCH v4 09/12] virtio, qtest: Add misc functions to handle pci information

2016-03-09 Thread Tetsuya Mukawa
The patch adds below functions.
 - qtest_read_pci_cfg
 - qtest_get_bar
 - qtest_get_bar_addr
 - qtest_get_bar_size
These are used for handling pci device information.
It will be called by later patches.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/qtest_utils.c | 77 
 drivers/net/virtio/qtest_utils.h | 56 +
 2 files changed, 133 insertions(+)

diff --git a/drivers/net/virtio/qtest_utils.c b/drivers/net/virtio/qtest_utils.c
index 337546a..55ed504 100644
--- a/drivers/net/virtio/qtest_utils.c
+++ b/drivers/net/virtio/qtest_utils.c
@@ -427,6 +427,83 @@ qtest_find_device(struct qtest_session *s, const char 
*name)
return NULL;
 }

+/*
+ * The function is used for reading pci configuration space of specifed device.
+ */
+int
+qtest_read_pci_cfg(struct qtest_session *s, const char *name,
+   void *buf, size_t len, off_t offset)
+{
+   struct qtest_pci_device *dev;
+   uint32_t i;
+   uint8_t *p = buf;
+
+   dev = qtest_find_device(s, name);
+   if (dev == NULL) {
+   PMD_DRV_LOG(ERR, "Cannot find specified device: %s\n", name);
+   return -1;
+   }
+
+   for (i = 0; i < len; i++) {
+   *(p + i) = qtest_pci_inb(s,
+   dev->bus_addr, dev->device_addr, 0, offset + i);
+   }
+
+   return 0;
+}
+
+static struct qtest_pci_bar *
+qtest_get_bar(struct qtest_session *s, const char *name, uint8_t bar)
+{
+   struct qtest_pci_device *dev;
+
+   if (bar >= NB_BAR) {
+   PMD_DRV_LOG(ERR, "Invalid bar is specified: %u\n", bar);
+   return NULL;
+   }
+
+   dev = qtest_find_device(s, name);
+   if (dev == NULL) {
+   PMD_DRV_LOG(ERR, "Cannot find specified device: %s\n", name);
+   return NULL;
+   }
+
+   if (dev->bar[bar].type == QTEST_PCI_BAR_DISABLE) {
+   PMD_DRV_LOG(ERR, "Cannot find valid BAR(%s): %u\n", name, bar);
+   return NULL;
+   }
+
+   return >bar[bar];
+}
+
+int
+qtest_get_bar_addr(struct qtest_session *s, const char *name,
+   uint8_t bar, uint64_t **addr)
+{
+   struct qtest_pci_bar *bar_ptr;
+
+   bar_ptr = qtest_get_bar(s, name, bar);
+   if (bar_ptr == NULL)
+   return -1;
+
+   *addr = (uint64_t *)bar_ptr->region_start;
+   return 0;
+}
+
+int
+qtest_get_bar_size(struct qtest_session *s, const char *name,
+   uint8_t bar, uint64_t *size)
+{
+   struct qtest_pci_bar *bar_ptr;
+
+   bar_ptr = qtest_get_bar(s, name, bar);
+   if (bar_ptr == NULL)
+   return -1;
+
+   *size = bar_ptr->region_size;
+   return 0;
+}
+
 int
 qtest_intr_enable(struct qtest_session *s)
 {
diff --git a/drivers/net/virtio/qtest_utils.h b/drivers/net/virtio/qtest_utils.h
index 0717ee9..dfd2b03 100644
--- a/drivers/net/virtio/qtest_utils.h
+++ b/drivers/net/virtio/qtest_utils.h
@@ -270,6 +270,62 @@ void qtest_write(struct qtest_session *s, uint64_t addr,

 /**
  * @internal
+ * Read pci configuration space of QEMU guest.
+ *
+ * @param s
+ *   The pointer to qtest session structure.
+ * @param name
+ *   The name of pci device.
+ * @param buf
+ *   The pointer to the buffer.
+ * @param len
+ *   Length to read.
+ * @param offset
+ *   Offset of pci configuration space.
+ * @return
+ *   0 on success, negative on error
+ */
+int qtest_read_pci_cfg(struct qtest_session *s, const char *name,
+   void *buf, size_t len, off_t offset);
+
+/**
+ * @internal
+ * Get BAR address of a specified pci device.
+ *
+ * @param s
+ *   The pointer to qtest session structure.
+ * @param name
+ *   The name of pci device.
+ * @param bar
+ *   The index of BAR. Should be between 0 to 5.
+ * @param addr
+ *   The pointer to store BAR address.
+ * @return
+ *   0 on success, negative on error
+ */
+int qtest_get_bar_addr(struct qtest_session *s, const char *name,
+   uint8_t bar, uint64_t **addr);
+
+/**
+ * @internal
+ * Get BAR size of a specified pci device.
+ *
+ * @param s
+ *   The pointer to qtest session structure.
+ * @param name
+ *   The name of pci device.
+ * @param bar
+ *   The index of BAR. Should be between 0 to 5.
+ * @param size
+ *   The pointer to store BAR size.
+ * @return
+ *   0 on success, negative on error
+ */
+int qtest_get_bar_size(struct qtest_session *s, const char *name,
+   uint8_t bar, uint64_t *size);
+
+/**
+ * @internal
  * Initialization function of piix3 device.
  *
  * @param s
-- 
2.1.4



[dpdk-dev] [PATCH v4 08/12] virtio, qtest: Add functionality to handle interrupt

2016-03-09 Thread Tetsuya Mukawa
The patch adds functionality to handle interrupt from pci device of
QEMU guest. To handle the interrupts, the patch adds to initialize piix3
pci device.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/qtest_utils.c | 225 ++-
 drivers/net/virtio/qtest_utils.h |  68 +++-
 2 files changed, 287 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio/qtest_utils.c b/drivers/net/virtio/qtest_utils.c
index 338224a..337546a 100644
--- a/drivers/net/virtio/qtest_utils.c
+++ b/drivers/net/virtio/qtest_utils.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 

@@ -43,6 +44,12 @@
 #include "virtio_ethdev.h"
 #include "qtest_utils.h"

+/* PIIX3 configuration registers */
+#define PIIX3_REG_ADDR_PIRQA0x60
+#define PIIX3_REG_ADDR_PIRQB0x61
+#define PIIX3_REG_ADDR_PIRQC0x62
+#define PIIX3_REG_ADDR_PIRQD0x63
+
 /* ivshmem configuration */
 #define IVSHMEM_PROTOCOL_VERSION0

@@ -74,6 +81,14 @@ struct qtest_session {
size_t evq_total_len;

union qtest_pipefds msgfds;
+
+   int irqno;
+   pthread_t intr_th;
+   int intr_th_started;
+   int eventfd;
+   rte_atomic16_t enable_intr;
+   rte_intr_callback_fn cb;
+   void *cb_arg;
 };

 static int
@@ -230,6 +245,29 @@ qtest_pci_inb(struct qtest_session *s, uint8_t bus, 
uint8_t device,
return (tmp >> ((offset & 0x3) * 8)) & 0xff;
 }

+static void
+qtest_pci_outb(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset, uint8_t value)
+{
+   uint32_t addr, tmp, pos;
+
+   addr = PCI_CONFIG_ADDR(bus, device, function, offset);
+   pos = (offset % 4) * 8;
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, addr, 'l');
+   tmp = qtest_raw_in(s, 0xcfc, 'l');
+   tmp = (tmp & ~(0xff << pos)) | (value << pos);
+
+   qtest_raw_out(s, 0xcf8, addr, 'l');
+   qtest_raw_out(s, 0xcfc, tmp, 'l');
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+}
+
 static uint32_t
 qtest_pci_inl(struct qtest_session *s, uint8_t bus, uint8_t device,
uint8_t function, uint8_t offset)
@@ -389,15 +427,112 @@ qtest_find_device(struct qtest_session *s, const char 
*name)
return NULL;
 }

+int
+qtest_intr_enable(struct qtest_session *s)
+{
+   rte_atomic16_set(>enable_intr, 1);
+
+   return 0;
+}
+
+int
+qtest_intr_disable(struct qtest_session *s)
+{
+   rte_atomic16_set(>enable_intr, 0);
+
+   return 0;
+}
+
+void
+qtest_intr_callback_register(struct qtest_session *s,
+   rte_intr_callback_fn cb, void *cb_arg)
+{
+   s->cb = cb;
+   s->cb_arg = cb_arg;
+   rte_atomic16_set(>enable_intr, 1);
+}
+
+void
+qtest_intr_callback_unregister(struct qtest_session *s,
+   rte_intr_callback_fn cb __rte_unused,
+   void *cb_arg __rte_unused)
+{
+   rte_atomic16_set(>enable_intr, 0);
+   s->cb = NULL;
+   s->cb_arg = NULL;
+}
+
+static void *
+qtest_intr_handler(void *data) {
+   struct qtest_session *s = (struct qtest_session *)data;
+   eventfd_t value;
+   int ret;
+
+   for (;;) {
+   ret = eventfd_read(s->eventfd, );
+   if (ret < 0)
+   return NULL;
+   s->cb(NULL, s->cb_arg);
+   }
+   return NULL;
+}
+
+static int
+qtest_intr_initialize(struct qtest_session *s)
+{
+   char buf[64];
+   int ret;
+
+   snprintf(buf, sizeof(buf), "irq_intercept_in ioapic\n");
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   /* To enable interrupt, send "irq_intercept_in" message to QEMU */
+   ret = qtest_raw_send(s->qtest_socket, buf, strlen(buf));
+   if (ret < 0) {
+   pthread_mutex_unlock(>qtest_session_lock);
+   return -1;
+   }
+
+   /* just ignore QEMU response */
+   ret = qtest_raw_recv(s->msgfds.readfd, buf, sizeof(buf));
+   if (ret < 0) {
+   pthread_mutex_unlock(>qtest_session_lock);
+   return -1;
+   }
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   return 0;
+}
+
 static void
 qtest_event_send(struct qtest_session *s, char *buf)
 {
+   char interrupt_message[32];
int ret;

-   /* relay normal message to pipe */
-   ret = qtest_raw_send(s->msgfds.writefd, buf, strlen(buf));
-   if (ret < 0)
-   rte_panic("cannot relay normal message\n");
+   /* This message will come when interrupt occurs */
+   snprintf(interrupt_message, sizeof(interrupt_message),
+   "IRQ raise %d", s->irqno);
+
+   if (strncmp(buf, interrupt_message,
+ 

[dpdk-dev] [PATCH v4 07/12] virtio, qtest: Add functionality to share memory between QTest guest

2016-03-09 Thread Tetsuya Mukawa
The patch adds functionality to share memory between QTest guest and
DPDK application using ivshmem device.
The shared memory will be all EAL memory on hugepages. This memory will
be accessed by QEMU vcpu and DPDK application using same address.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/qtest_utils.c | 106 ++-
 drivers/net/virtio/qtest_utils.h |   4 +-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/qtest_utils.c b/drivers/net/virtio/qtest_utils.c
index 000c7e8..338224a 100644
--- a/drivers/net/virtio/qtest_utils.c
+++ b/drivers/net/virtio/qtest_utils.c
@@ -43,6 +43,9 @@
 #include "virtio_ethdev.h"
 #include "qtest_utils.h"

+/* ivshmem configuration */
+#define IVSHMEM_PROTOCOL_VERSION0
+
 #define PCI_CONFIG_ADDR(_bus, _device, _function, _offset) ( \
(1 << 31) | ((_bus) & 0xff) << 16 | ((_device) & 0x1f) << 11 | \
((_function) & 0x7) << 8 | ((_offset) & 0xfc))
@@ -59,6 +62,7 @@ union qtest_pipefds {

 struct qtest_session {
int qtest_socket;
+   int ivshmem_socket;
pthread_mutex_t qtest_session_lock;

struct qtest_pci_device_list head;
@@ -411,6 +415,7 @@ qtest_close_sockets(struct qtest_session *s)
qtest_close_one_socket(>qtest_socket);
qtest_close_one_socket(>msgfds.readfd);
qtest_close_one_socket(>msgfds.writefd);
+   qtest_close_one_socket(>ivshmem_socket);
 }

 static void
@@ -716,6 +721,93 @@ qtest_register_target_devices(struct qtest_session *s,
 }

 static int
+qtest_send_message_to_ivshmem(int sock_fd, uint64_t client_id, int shm_fd)
+{
+   struct iovec iov;
+   struct msghdr msgh;
+   size_t fdsize = sizeof(int);
+   char control[CMSG_SPACE(fdsize)];
+   struct cmsghdr *cmsg;
+   int ret;
+
+   memset(, 0, sizeof(msgh));
+   iov.iov_base = _id;
+   iov.iov_len = sizeof(client_id);
+
+   msgh.msg_iov = 
+   msgh.msg_iovlen = 1;
+
+   if (shm_fd >= 0) {
+   msgh.msg_control = 
+   msgh.msg_controllen = sizeof(control);
+   cmsg = CMSG_FIRSTHDR();
+   cmsg->cmsg_len = CMSG_LEN(fdsize);
+   cmsg->cmsg_level = SOL_SOCKET;
+   cmsg->cmsg_type = SCM_RIGHTS;
+   memcpy(CMSG_DATA(cmsg), _fd, fdsize);
+   }
+
+   do {
+   ret = sendmsg(sock_fd, , 0);
+   } while (ret < 0 && errno == EINTR);
+
+   if (ret < 0) {
+   PMD_DRV_LOG(ERR, "sendmsg error\n");
+   return ret;
+   }
+
+   return ret;
+}
+
+static int
+qtest_setup_shared_memory(struct qtest_session *s)
+{
+   int shm_fd, num, ret;
+   struct back_file *huges;
+
+   num = rte_eal_get_backfile_info();
+   if (num != 1) {
+   PMD_DRV_LOG(ERR,
+   "Not supported memory configuration\n");
+   return -1;
+   }
+
+   shm_fd = open(huges[0].filepath, O_RDWR);
+   if (shm_fd < 0) {
+   PMD_DRV_LOG(ERR,
+   "Cannot open file: %s\n", huges[0].filepath);
+   return -1;
+   }
+
+   /* send our protocol version first */
+   ret = qtest_send_message_to_ivshmem(s->ivshmem_socket,
+   IVSHMEM_PROTOCOL_VERSION, -1);
+   if (ret < 0) {
+   PMD_DRV_LOG(ERR,
+   "Failed to send protocol version to ivshmem\n");
+   return -1;
+   }
+
+   /* send client id */
+   ret = qtest_send_message_to_ivshmem(s->ivshmem_socket, 0, -1);
+   if (ret < 0) {
+   PMD_DRV_LOG(ERR, "Failed to send VMID to ivshmem\n");
+   return -1;
+   }
+
+   /* send message to ivshmem */
+   ret = qtest_send_message_to_ivshmem(s->ivshmem_socket, -1, shm_fd);
+   if (ret < 0) {
+   PMD_DRV_LOG(ERR, "Failed to file descriptor to ivshmem\n");
+   return -1;
+   }
+
+   close(shm_fd);
+
+   return 0;
+}
+
+static int
 qtest_open_socket(char *path)
 {
struct sockaddr_un sa = {0};
@@ -769,7 +861,7 @@ qtest_vdev_uninit(struct qtest_session *s)
 }

 struct qtest_session *
-qtest_vdev_init(char *qtest_path,
+qtest_vdev_init(char *qtest_path, char *ivshmem_path,
struct qtest_pci_device *devices, int devnum)
 {
struct qtest_session *s;
@@ -800,6 +892,12 @@ qtest_vdev_init(char *qtest_path,
goto error;
}

+   s->ivshmem_socket = qtest_open_socket(ivshmem_path);
+   if (s->ivshmem_socket < 0) {
+   PMD_DRV_LOG(ERR, "Failed to open %s\n", ivshmem_path);
+   goto error;
+   }
+
s->qtest_socket = qtest_open_socket(qtest_path);
if (s->qtest_socket < 0) {
PMD_DRV_LOG(ERR, "Failed to open %s\n", qtest_path);
@@ -813,6 +911,12 @@ qtest_vdev_init(char *qtest_path,
}
s->event_th_started = 1;

+   ret = qtest_setup_shared_memory(s);
+   if (ret 

[dpdk-dev] [PATCH v4 06/12] virtio, qtest: Add pci device initialization function to qtest utils

2016-03-09 Thread Tetsuya Mukawa
The patch adds general pci device initialization functionality to
qtest utils. It initializes pci devices using qtest messaging.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/qtest_utils.c | 349 ++-
 drivers/net/virtio/qtest_utils.h | 114 -
 2 files changed, 461 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/qtest_utils.c b/drivers/net/virtio/qtest_utils.c
index f4cd6af..000c7e8 100644
--- a/drivers/net/virtio/qtest_utils.c
+++ b/drivers/net/virtio/qtest_utils.c
@@ -43,6 +43,10 @@
 #include "virtio_ethdev.h"
 #include "qtest_utils.h"

+#define PCI_CONFIG_ADDR(_bus, _device, _function, _offset) ( \
+   (1 << 31) | ((_bus) & 0xff) << 16 | ((_device) & 0x1f) << 11 | \
+   ((_function) & 0x7) << 8 | ((_offset) & 0xfc))
+
 union qtest_pipefds {
struct {
int pipefd[2];
@@ -57,6 +61,8 @@ struct qtest_session {
int qtest_socket;
pthread_mutex_t qtest_session_lock;

+   struct qtest_pci_device_list head;
+
pthread_t event_th;
int event_th_started;
char *evq;
@@ -195,6 +201,119 @@ qtest_raw_write(struct qtest_session *s, uint64_t addr, 
uint32_t val, char type)
 }

 /*
+ * qtest_pci_inX/outX are used for accessing PCI configuration space.
+ * The functions are implemented based on PCI configuration space
+ * specification.
+ * Accroding to the spec, access size of read()/write() should be 4 bytes.
+ */
+static int
+qtest_pci_inb(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset)
+{
+   uint32_t tmp;
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   tmp = qtest_raw_in(s, 0xcfc, 'l');
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+
+   return (tmp >> ((offset & 0x3) * 8)) & 0xff;
+}
+
+static uint32_t
+qtest_pci_inl(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset)
+{
+   uint32_t tmp;
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   tmp = qtest_raw_in(s, 0xcfc, 'l');
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+
+   return tmp;
+}
+
+static void
+qtest_pci_outl(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset, uint32_t value)
+{
+   uint32_t tmp;
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   qtest_raw_out(s, 0xcfc, value, 'l');
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+}
+
+static uint64_t
+qtest_pci_inq(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset)
+{
+   uint32_t tmp;
+   uint64_t val;
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   val = (uint64_t)qtest_raw_in(s, 0xcfc, 'l');
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset + 4);
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   val |= (uint64_t)qtest_raw_in(s, 0xcfc, 'l') << 32;
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+
+   return val;
+}
+
+static void
+qtest_pci_outq(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset, uint64_t value)
+{
+   uint32_t tmp;
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   qtest_raw_out(s, 0xcfc, (uint32_t)(value & 0x), 'l');
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset + 4);
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   qtest_raw_out(s, 0xcfc, (uint32_t)(value >> 32), 'l');
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+}
+
+/*
  * qtest_in/out are used for accessing ioport of qemu guest.
  * qtest_read/write are used for accessing memory of qemu guest.
  */
@@ -254,6 +373,18 @@ qtest_write(struct qtest_session *s, uint64_t addr, 
uint64_t val, char type)
rte_panic("Cannot lock mutex\n");
 }

+static struct 

[dpdk-dev] [PATCH v4 05/12] virtio, qtest: Add QTest utility basic functions

2016-03-09 Thread Tetsuya Mukawa
The patch adds basic functions for accessing to QEMU quest that runs in
QTest mode. The functions will be used by virtio container extension
that can access to the above guest.

Signed-off-by: Tetsuya Mukawa 
---
 config/common_base   |   1 +
 drivers/net/virtio/Makefile  |   4 +
 drivers/net/virtio/qtest_utils.c | 480 +++
 drivers/net/virtio/qtest_utils.h | 119 ++
 4 files changed, 604 insertions(+)
 create mode 100644 drivers/net/virtio/qtest_utils.c
 create mode 100644 drivers/net/virtio/qtest_utils.h

diff --git a/config/common_base b/config/common_base
index 340feaf..b19cb59 100644
--- a/config/common_base
+++ b/config/common_base
@@ -260,6 +260,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n
 # Enable virtio support for container
 #
 CONFIG_RTE_VIRTIO_VDEV=n
+CONFIG_RTE_VIRTIO_VDEV_QTEST=n

 #
 # Compile burst-oriented VMXNET3 PMD driver
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 9e83852..e6d5a04 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -59,6 +59,10 @@ ifeq ($(CONFIG_RTE_VIRTIO_VDEV),y)
SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += vhost_embedded.c
 endif

+ifeq ($(CONFIG_RTE_VIRTIO_VDEV_QTEST),y)
+   SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += qtest_utils.c
+endif
+
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
diff --git a/drivers/net/virtio/qtest_utils.c b/drivers/net/virtio/qtest_utils.c
new file mode 100644
index 000..f4cd6af
--- /dev/null
+++ b/drivers/net/virtio/qtest_utils.c
@@ -0,0 +1,480 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "qtest_utils.h"
+
+union qtest_pipefds {
+   struct {
+   int pipefd[2];
+   };
+   struct {
+   int readfd;
+   int writefd;
+   };
+};
+
+struct qtest_session {
+   int qtest_socket;
+   pthread_mutex_t qtest_session_lock;
+
+   pthread_t event_th;
+   int event_th_started;
+   char *evq;
+   char *evq_dequeue_ptr;
+   size_t evq_total_len;
+
+   union qtest_pipefds msgfds;
+};
+
+static int
+qtest_raw_send(int fd, char *buf, size_t count)
+{
+   size_t len = count;
+   size_t total_len = 0;
+   int ret = 0;
+
+   while (len > 0) {
+   ret = write(fd, buf, len);
+   if (ret == -1) {
+   if (errno == EINTR)
+   continue;
+   return ret;
+   }
+   if (ret == (int)len)
+   break;
+   total_len += ret;
+   buf += ret;
+   len -= ret;
+   }
+   return total_len + ret;
+}
+
+static int
+qtest_raw_recv(int fd, char *buf, size_t count)
+{
+   size_t len = count;
+   size_t total_len = 0;
+   int ret = 0;
+
+   while (len > 0) {
+   ret = read(fd, buf, len);
+   if (ret <= 0) {
+   if (errno == EINTR) {
+   continue;
+   }
+   return 

[dpdk-dev] [PATCH v4 04/12] EAL: Add a new "--align-memsize" option

2016-03-09 Thread Tetsuya Mukawa
The option will work with "--range-virtaddr", and if the option is
specified, mapped address will be align by EAL memory size.
Such an alignment is required for using virtio-net PMD extension
on container that uses QEMU QTest framework.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_options.c | 8 
 lib/librte_eal/common/eal_internal_cfg.h   | 1 +
 lib/librte_eal/common/eal_options.h| 2 ++
 lib/librte_eal/linuxapp/eal/eal.c  | 4 
 lib/librte_eal/linuxapp/eal/eal_memory.c   | 9 +
 5 files changed, 24 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 3b4f789..853420a 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -75,6 +75,7 @@ const struct option
 eal_long_options[] = {
{OPT_BASE_VIRTADDR, 1, NULL, OPT_BASE_VIRTADDR_NUM},
{OPT_RANGE_VIRTADDR,1, NULL, OPT_RANGE_VIRTADDR_NUM   },
+   {OPT_ALIGN_MEMSIZE, 0, NULL, OPT_ALIGN_MEMSIZE_NUM},
{OPT_CREATE_UIO_DEV,0, NULL, OPT_CREATE_UIO_DEV_NUM   },
{OPT_FILE_PREFIX,   1, NULL, OPT_FILE_PREFIX_NUM  },
{OPT_HELP,  0, NULL, OPT_HELP_NUM },
@@ -140,6 +141,7 @@ eal_reset_internal_config(struct internal_config 
*internal_cfg)
internal_cfg->base_virtaddr = 0;
internal_cfg->range_virtaddr_start = 0;
internal_cfg->range_virtaddr_end = 0;
+   internal_cfg->align_memsize = 0;

internal_cfg->syslog_facility = LOG_DAEMON;
/* default value from build option */
@@ -994,6 +996,12 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
return -1;
}

+   if (internal_cfg->range_virtaddr_end == 0 && 
internal_cfg->align_memsize) {
+   RTE_LOG(ERR, EAL, "Option --"OPT_RANGE_VIRTADDR" should be "
+   "specified together with --"OPT_ALIGN_MEMSIZE"\n");
+   return -1;
+   }
+
return 0;
 }

diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index 0734630..df33a9f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -80,6 +80,7 @@ struct internal_config {
uintptr_t base_virtaddr;  /**< base address to try and reserve 
memory from */
uintptr_t range_virtaddr_start;   /**< start address of mappable region 
*/
uintptr_t range_virtaddr_end; /**< end address of mappable region */
+   volatile unsigned align_memsize;  /**< true to align virtaddr by memory 
size */
volatile int syslog_facility; /**< facility passed to openlog() */
volatile uint32_t log_level;  /**< default log level */
/** default interrupt mode for VFIO */
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index 8e4cf1d..9e36f68 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -49,6 +49,8 @@ enum {
OPT_BASE_VIRTADDR_NUM,
 #define OPT_RANGE_VIRTADDR"range-virtaddr"
OPT_RANGE_VIRTADDR_NUM,
+#define OPT_ALIGN_MEMSIZE "align-memsize"
+   OPT_ALIGN_MEMSIZE_NUM,
 #define OPT_CREATE_UIO_DEV"create-uio-dev"
OPT_CREATE_UIO_DEV_NUM,
 #define OPT_FILE_PREFIX   "file-prefix"
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 62b7a57..e2a0096 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -643,6 +643,10 @@ eal_parse_args(int argc, char **argv)
}
break;

+   case OPT_ALIGN_MEMSIZE_NUM:
+   internal_config.align_memsize = 1;
+   break;
+
case OPT_VFIO_INTR_NUM:
if (eal_parse_vfio_intr(optarg) < 0) {
RTE_LOG(ERR, EAL, "invalid parameters for --"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e15bf4c..1c9eb3c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -272,6 +272,15 @@ rte_eal_get_free_region(uint64_t pagesz)
return 0;
}

+   if (internal_config.align_memsize) {
+   /*
+* Typically, BAR register of PCI device requiers such
+* an alignment.
+*/
+   low_limit = RTE_ALIGN_CEIL(low_limit, alloc_size);
+   high_limit = RTE_ALIGN_FLOOR(high_limit, alloc_size);
+   }
+
fp = fopen("/proc/self/maps", "r");
if (fp == NULL) {
rte_panic("Cannot open /proc/self/maps\n");
-- 
2.1.4



[dpdk-dev] [PATCH v4 03/12] EAL: Add a new "--range-virtaddr" option

2016-03-09 Thread Tetsuya Mukawa
The option specifies how to mmap EAL memory.
If the option is specified like '--range-virtaddr=-',
EAL will check /proc/maps, then tries to find free region between addr1
and addr2. If a region is found, EAL will treat it as if 'base-virtaddr'
is specified. Because of this, the option will not work with
'--base-virtaddr'.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_options.c |  9 
 lib/librte_eal/common/eal_internal_cfg.h   |  2 +
 lib/librte_eal/common/eal_options.h|  2 +
 lib/librte_eal/linuxapp/eal/eal.c  | 39 ++
 lib/librte_eal/linuxapp/eal/eal_memory.c   | 82 +-
 5 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 65bccbd..3b4f789 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -74,6 +74,7 @@ eal_short_options[] =
 const struct option
 eal_long_options[] = {
{OPT_BASE_VIRTADDR, 1, NULL, OPT_BASE_VIRTADDR_NUM},
+   {OPT_RANGE_VIRTADDR,1, NULL, OPT_RANGE_VIRTADDR_NUM   },
{OPT_CREATE_UIO_DEV,0, NULL, OPT_CREATE_UIO_DEV_NUM   },
{OPT_FILE_PREFIX,   1, NULL, OPT_FILE_PREFIX_NUM  },
{OPT_HELP,  0, NULL, OPT_HELP_NUM },
@@ -137,6 +138,8 @@ eal_reset_internal_config(struct internal_config 
*internal_cfg)
for (i = 0; i < MAX_HUGEPAGE_SIZES; i++)
internal_cfg->hugepage_info[i].lock_descriptor = -1;
internal_cfg->base_virtaddr = 0;
+   internal_cfg->range_virtaddr_start = 0;
+   internal_cfg->range_virtaddr_end = 0;

internal_cfg->syslog_facility = LOG_DAEMON;
/* default value from build option */
@@ -985,6 +988,12 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
return -1;
}

+   if (internal_cfg->base_virtaddr && internal_cfg->range_virtaddr_end) {
+   RTE_LOG(ERR, EAL, "Option --"OPT_RANGE_VIRTADDR" cannot "
+   "be specified together with --"OPT_BASE_VIRTADDR"\n");
+   return -1;
+   }
+
return 0;
 }

diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index 9117ed9..0734630 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -78,6 +78,8 @@ struct internal_config {
volatile unsigned force_sockets;
volatile uint64_t socket_mem[RTE_MAX_NUMA_NODES]; /**< amount of memory 
per socket */
uintptr_t base_virtaddr;  /**< base address to try and reserve 
memory from */
+   uintptr_t range_virtaddr_start;   /**< start address of mappable region 
*/
+   uintptr_t range_virtaddr_end; /**< end address of mappable region */
volatile int syslog_facility; /**< facility passed to openlog() */
volatile uint32_t log_level;  /**< default log level */
/** default interrupt mode for VFIO */
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index e5da14a..8e4cf1d 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -47,6 +47,8 @@ enum {
OPT_LONG_MIN_NUM = 256,
 #define OPT_BASE_VIRTADDR "base-virtaddr"
OPT_BASE_VIRTADDR_NUM,
+#define OPT_RANGE_VIRTADDR"range-virtaddr"
+   OPT_RANGE_VIRTADDR_NUM,
 #define OPT_CREATE_UIO_DEV"create-uio-dev"
OPT_CREATE_UIO_DEV_NUM,
 #define OPT_FILE_PREFIX   "file-prefix"
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 6bae02c..62b7a57 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -444,6 +444,35 @@ eal_parse_base_virtaddr(const char *arg)
 }

 static int
+eal_parse_range_virtaddr(const char *range)
+{
+   char *p, *endptr;
+   uint64_t tmp_start, tmp_end;
+
+   p = strchr(range, '-');
+   if (p == NULL)
+   return -1;
+   *p++ = '\0';
+
+   errno = 0;
+   tmp_start = strtoul(range, , 0);
+   if ((errno != 0) || endptr == NULL || (*endptr != '\0'))
+   return -1;
+
+   tmp_end = strtoul(p, , 0);
+   if ((errno != 0) || endptr == NULL || (*endptr != '\0'))
+   return -1;
+
+   if (tmp_start >= tmp_end)
+   return -1;
+
+   internal_config.range_virtaddr_start = tmp_start;
+   internal_config.range_virtaddr_end = tmp_end;
+
+   return 0;
+}
+
+static int
 eal_parse_vfio_intr(const char *mode)
 {
unsigned i;
@@ -604,6 +633,16 @@ eal_parse_args(int argc, char **argv)
}
break;

+   case OPT_RANGE_VIRTADDR_NUM:
+   if (eal_parse_range_virtaddr(optarg) < 0) {
+   RTE_LOG(ERR, EAL, "invalid parameter for --"
+  

[dpdk-dev] [PATCH v4 02/12] vhost: Add a function to check virtio device type

2016-03-09 Thread Tetsuya Mukawa
The patch adds below function to cleanup virtio code.
 - virtio_dev_check()

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/virtio_ethdev.c | 52 ++
 drivers/net/virtio/virtio_ethdev.h | 32 +++
 2 files changed, 57 insertions(+), 27 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 429377b..bc631c7 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -371,7 +371,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->mz = mz;
vq->vq_ring_virt_mem = mz->addr;

-   if (dev->dev_type == RTE_ETH_DEV_PCI) {
+   if (virtio_dev_check(dev, RTE_ETH_DEV_PCI, NULL, 0)) {
vq->vq_ring_mem = mz->phys_addr;

/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
@@ -429,7 +429,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->virtio_net_hdr_vaddr = mz->addr;
memset(vq->virtio_net_hdr_vaddr, 0, hdr_size);

-   if (dev->dev_type == RTE_ETH_DEV_PCI)
+   if (virtio_dev_check(dev, RTE_ETH_DEV_PCI, NULL, 0))
vq->virtio_net_hdr_mem = mz->phys_addr;
 #ifdef RTE_VIRTIO_VDEV
else
@@ -439,7 +439,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,

hw->vtpci_ops->setup_queue(hw, vq);

-   if (dev->dev_type == RTE_ETH_DEV_PCI)
+   if (virtio_dev_check(dev, RTE_ETH_DEV_PCI, NULL, 0))
vq->offset = offsetof(struct rte_mbuf, buf_physaddr);
 #ifdef RTE_VIRTIO_VDEV
else
@@ -490,15 +490,13 @@ static void
 virtio_dev_close(struct rte_eth_dev *dev)
 {
struct virtio_hw *hw = dev->data->dev_private;
-   struct rte_pci_device *pci_dev = dev->pci_dev;

PMD_INIT_LOG(DEBUG, "virtio_dev_close");

/* reset the NIC */
-   if (dev->dev_type == RTE_ETH_DEV_PCI) {
-   if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
-   vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
-   }
+   if (virtio_dev_check(dev, RTE_ETH_DEV_PCI, NULL, RTE_PCI_DRV_INTR_LSC))
+   vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
+
vtpci_reset(hw);
hw->started = 0;
virtio_dev_free_mbufs(dev);
@@ -1001,7 +999,7 @@ virtio_interrupt_handler(__rte_unused struct 
rte_intr_handle *handle,
isr = vtpci_isr(hw);
PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);

-   if (dev->dev_type == RTE_ETH_DEV_PCI)
+   if (virtio_dev_check(dev, RTE_ETH_DEV_PCI, NULL, 0))
if (rte_intr_enable(>pci_dev->intr_handle) < 0)
PMD_DRV_LOG(ERR, "interrupt enable failed");

@@ -1056,9 +1054,10 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

pci_dev = eth_dev->pci_dev;

-   if (eth_dev->dev_type == RTE_ETH_DEV_PCI)
+   if (virtio_dev_check(eth_dev, RTE_ETH_DEV_PCI, NULL, 0)) {
if (vtpci_init(pci_dev, hw) < 0)
return -1;
+   }

/* Reset the device although not necessary at startup */
vtpci_reset(hw);
@@ -1072,7 +1071,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
return -1;

/* If host does not support status then disable LSC */
-   if (eth_dev->dev_type == RTE_ETH_DEV_PCI) {
+   if (virtio_dev_check(eth_dev, RTE_ETH_DEV_PCI, NULL, 0)) {
if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
pci_dev->driver->drv_flags &= ~RTE_PCI_DRV_INTR_LSC;

@@ -1154,13 +1153,14 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
hw->max_rx_queues, hw->max_tx_queues);
-   if (eth_dev->dev_type == RTE_ETH_DEV_PCI) {
+   if (virtio_dev_check(eth_dev, RTE_ETH_DEV_PCI, NULL, 0)) {
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
 eth_dev->data->port_id, pci_dev->id.vendor_id,
 pci_dev->id.device_id);

/* Setup interrupt callback  */
-   if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+   if (virtio_dev_check(eth_dev, RTE_ETH_DEV_PCI,
+   NULL, RTE_PCI_DRV_INTR_LSC))
rte_intr_callback_register(_dev->intr_handle,
   virtio_interrupt_handler,
   eth_dev);
@@ -1197,11 +1197,11 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
eth_dev->data->mac_addrs = NULL;

/* reset interrupt callback  */
-   if (eth_dev->dev_type == RTE_ETH_DEV_PCI)
-   if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
-   rte_intr_callback_unregister(_dev->intr_handle,
-

[dpdk-dev] [PATCH v4 01/12] virtio: Retrieve driver name from eth_dev

2016-03-09 Thread Tetsuya Mukawa
Currently, virtio_dev_info_get() retrieves driver name from pci_drv.
If the driver is virtual PMD, pci_drv will be invalid.
So retrieves the name from eth_dev.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/virtio_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index bff1926..429377b 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1438,7 +1438,7 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
 {
struct virtio_hw *hw = dev->data->dev_private;

-   dev_info->driver_name = dev->driver->pci_drv.name;
+   dev_info->driver_name = dev->data->drv_name;
dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
-- 
2.1.4



[dpdk-dev] [PATCH v4 00/12] Virtio-net PMD: QEMU QTest extension for container

2016-03-09 Thread Tetsuya Mukawa
The patches will work on below patch series.
 - [PATCH v2 0/5] virtio support for container

[Changes]
v4 changes:
 - Rebase on latest master.
 - Split patches.
 - To abstract qtest code more, change interface between current virtio
   code and qtest code.
 - Rename qtest.c to qtest_utils.c
 - Change implementation like below.
   - Set pci device information out of qtest abstraction, then pass it to
 qtest to initialize devices.
 - Remove redundant condition checking from qtest_raw_send/recv().
 - Fix return value of qtest_raw_send().

v3 changes:
 - Rebase on latest master.
 - remove "-qtest-virtio" option, then add "--range-virtaddr" and
   "--align-memsize" options.
 - Fix typos in qtest.c

v2 changes:
 - Rebase on above patch seiries.
 - Rebase on master
 - Add "--qtest-virtio" EAL option.
 - Fixes in qtest.c
  - Fix error handling for the case qtest connection is closed.
  - Use eventfd for interrupt messaging.
  - Use linux header for PCI register definitions.
  - Fix qtest_raw_send/recv to handle error correctly.
  - Fix bit mask of PCI_CONFIG_ADDR.
  - Describe memory and ioport usage of qtest guest in qtest.c
  - Remove loop that is for finding PCI devices.


[Abstraction]

Normally, virtio-net PMD only works on VM, because there is no virtio-net 
device on host.
This patches extend  virtio-net PMD to be able to work on host as virtual PMD.
But we didn't implement virtio-net device as a part of virtio-net PMD.
To prepare virtio-net device for the PMD, start QEMU process with special QTest 
mode, then connect it from virtio-net PMD through unix domain socket.

The PMD can connect to anywhere QEMU virtio-net device can.
For example, the PMD can connects to vhost-net kernel module and vhost-user 
backend application.
Similar to virtio-net PMD on QEMU, application memory that uses virtio-net PMD 
will be shared between vhost backend application.
But vhost backend application memory will not be shared.

Main target of this PMD is container like docker, rkt, lxc and etc.
We can isolate related processes(virtio-net PMD process, QEMU and vhost-user 
backend process) by container.
But, to communicate through unix domain socket, shared directory will be needed.


[How to use]

 Please use QEMU-2.5.1, or above.
 (So far, QEMU-2.5.1 hasn't been released yet, so please checkout master from 
QEMU repository)

 - Compile
 Set "CONFIG_RTE_VIRTIO_VDEV_QTEST=y" in config/common_linux.
 Then compile it.

 - Start QEMU like below.
 $ qemu-system-x86_64 \
  -machine pc-i440fx-1.4,accel=qtest \
  -display none -qtest-log /dev/null \
  -qtest unix:/tmp/socket,server \
  -netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1 \
  -device 
virtio-net-pci,netdev=net0,mq=on,disable-modern=false,addr=3 \
  -chardev socket,id=chr1,path=/tmp/ivshmem,server \
  -device ivshmem,size=1G,chardev=chr1,vectors=1,addr=4

 - Start DPDK application like below
 $ testpmd -c f -n 1 -m 1024 --no-pci --single-file --qtest-virtio \
 --vdev="eth_qtest_virtio0,qtest=/tmp/socket,ivshmem=/tmp/ivshmem"\
 -- --disable-hw-vlan --txqflags=0xf00 -i

(*1) Please Specify same memory size in QEMU and DPDK command line.
(*2) Should use qemu-2.5.1, or above.
(*3) QEMU process is needed per port.
(*4) virtio-1.0 device are only supported.
(*5) The vhost backends like vhost-net and vhost-user can be specified.
(*6) In most cases, just using above command is enough, but you can also
 specify other QEMU virtio-net options.
(*7) Only checked "pc-i440fx-1.4" machine, but may work with other
 machines. It depends on a machine has piix3 south bridge.
 If the machine doesn't have, virtio-net PMD cannot receive status
 changed interrupts.
(*8) Should not add "--enable-kvm" to QEMU command line.


[Detailed Description]

 - virtio-net device implementation
The PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality is 
used.
QTest is a test framework of QEMU devices. It allows us to implement a device 
driver outside of QEMU.
With QTest, we can implement DPDK application and virtio-net PMD as standalone 
process on host.
When QEMU is invoked as QTest mode, any guest code will not run.
To know more about QTest, see below.
http://wiki.qemu.org/Features/QTest

 - probing devices
QTest provides a unix domain socket. Through this socket, driver process can 
access to I/O port and memory of QEMU virtual machine.
The PMD will send I/O port accesses to probe pci devices.
If we can find virtio-net and ivshmem device, initialize the devices.
Also, I/O port accesses of virtio-net PMD will be sent through socket, and 
virtio-net PMD can initialize vitio-net device on QEMU correctly.

 - ivshmem device to share memory
To share memory that virtio-net PMD process uses, ivshmem device will be used.
Because ivshmem device can only handle one file descriptor, shared memory 
should be consist of one file.
To allocate such a memory, EAL has new 

[dpdk-dev] [PATCH v3 4/4] mempool: add in the RTE_NEXT_ABI for ABI breakages

2016-03-09 Thread Olivier MATZ
Hi David,

On 03/09/2016 05:28 PM, Hunt, David wrote:
>> Sorry, maybe I wasn't very clear in my previous messages. For me, the
>> NEXT_ABI is not the proper solution because, as Panu stated, it makes
>> the patch hard to read. My understanding of NEXT_ABI is that it should
>> only be used if the changes are small enough. Duplicating the code with
>> a big #ifdef NEXT_ABI is not an option to me either.
>>
>> So that's why the deprecation notice should be used instead. But in this
>> case, it means that this patch won't be present in 16.04, but will be
>> added in 16.07.
>>
> Sure, v4 will remove the NEXT_ABI patch , and replace it with just the
> ABI break announcement for 16.07. For anyone who what's to try out the
> patch, they can always get it from patchwork, but not as part 16.04.

I think it's better to have the deprecation notice in a separate
mail, outside of the patch series, so Thomas can just apply this
one and let the series pending for 16.07.

Thanks,
Olivier


[dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api

2016-03-09 Thread Thomas Monjalon
2016-03-09 16:17, Ananyev, Konstantin:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 2016-03-09 15:42, Ananyev, Konstantin:
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > 2016-03-09 15:23, Ananyev, Konstantin:
> > > > > >
> > > > > > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > > > > > +   if (to_send == 0)
> > > > > > > > > +   return 0;
> > > > > > > >
> > > > > > > > Why this check is done in the lib?
> > > > > > > > What is the performance gain if we are idle?
> > > > > > > > It can be done outside if needed.
> > > > > > >
> > > > > > > Yes, that could be done outside, but if user has to do it anyway,
> > > > > > > why not to put it inside?
> > > > > > > I don't expect any performance gain/loss because of that -
> > > > > > > just seems a bit more convenient to the user.
> > > > > >
> > > > > > It is handling an idle case so there is no gain obviously.
> > > > > > But the condition branching is surely a loss.
> > > > >
> > > > > I suppose that condition should always be checked:
> > > > > either in user code prior to function call or inside the
> > > > > function call itself.
> > > > > So don't expect any difference in performance here...
> > > > > Do you have any particular example when you think it would?
> > > > > Or are you talking about rte_eth_tx_buffer() calling
> > > > > rte_eth_tx_buffer_flush() internally?
> > > > > For that one - both are flush is 'static inline' , so I expect
> > > > > compiler be smart enough to remove this redundant check.
> > > > >
> > > > > > So why the user would you like to do this check?
> > > > > Just for user convenience - to save him doing that manually.
> > > >
> > > > Probably I've missed something. If we remove this check, the function
> > > > will do nothing, right? How is it changing the behaviour?
> > >
> > > If we'll remove that check, then
> > > rte_eth_tx_burst(...,nb_pkts=0)->(*dev->tx_pkt_burst)(...,nb_pkts=0)
> > > will be called.
> > > So in that case it might be even slower, as we'll have to do a proper 
> > > call.
> > 
> > If there is no packet, we have time to do a useless call.
> 
> One lcore can do TX for several queues/ports.
> Let say we have N queues to handle, but right now traffic is going only 
> through
> one of them. 
> That means we'll have to do N-1 useless calls and reduce number of cycles
> available to send actual traffic.

OK, good justification, thanks.

> > > Of course user can avoid it by:
> > >
> > > If(tx_buffer->nb_pkts != 0)
> > >   rte_eth_tx_buffer_flush(port, queue, tx_buffer);
> > >
> > > But as I said what for to force user to do that?
> > > Why not to  make this check inside the function?
> > 
> > Because it may be slower when there are some packets
> > and will "accelerate" only the no-packet case.
> > 
> > We do not progress in this discussion.
> > It is not a big deal, 
> 
> Exactly.
> 
> >just a non sense.
> 
> Look at what most of current DPDK examples do: they do check manually
> does nb_pkts==0 or not, if not call tx_burst().
> For me it makes sense to move that check into the library function -
> so each and every caller doesn't have to do it manually.
> 
> > So I agree to keep it if we change the website to announce that DPDK
> > accelerates the idle processing ;)
> 
> That's fine by me, but at first I suppose you'll have to provide some data
> showing that this approach slowdowns things, right? :)

You got me


[dpdk-dev] [RFC 35/35] mempool: update copyright

2016-03-09 Thread Olivier Matz
Update the copyright of files touched by this patch series.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 1 +
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 209449a..3851edd 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2016 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7599790..56220a4 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2016 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
-- 
2.1.4



[dpdk-dev] [RFC 34/35] mempool: new flag when phys contig mem is not needed

2016-03-09 Thread Olivier Matz
Add a new flag to remove the constraint of having physically contiguous
objects inside a mempool.

Add this flag to the log history mempool to start, but we could add
it in most cases where objects are not mbufs.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/common/eal_common_log.c |  2 +-
 lib/librte_mempool/rte_mempool.c   | 23 ---
 lib/librte_mempool/rte_mempool.h   |  5 +
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_log.c 
b/lib/librte_eal/common/eal_common_log.c
index 1ae8de7..9122b34 100644
--- a/lib/librte_eal/common/eal_common_log.c
+++ b/lib/librte_eal/common/eal_common_log.c
@@ -322,7 +322,7 @@ rte_eal_common_log_init(FILE *default_log)
LOG_ELT_SIZE, 0, 0,
NULL, NULL,
NULL, NULL,
-   SOCKET_ID_ANY, 0);
+   SOCKET_ID_ANY, MEMPOOL_F_NO_PHYS_CONTIG);

if ((log_history_mp == NULL) &&
((log_history_mp = rte_mempool_lookup(LOG_HISTORY_MP_NAME)) == 
NULL)){
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 397e6ec..209449a 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -412,7 +412,11 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,

while (off + total_elt_sz <= len && mp->populated_size < mp->size) {
off += mp->header_size;
-   mempool_add_elem(mp, (char *)vaddr + off, paddr + off);
+   if (paddr == RTE_BAD_PHYS_ADDR)
+   mempool_add_elem(mp, (char *)vaddr + off,
+   RTE_BAD_PHYS_ADDR);
+   else
+   mempool_add_elem(mp, (char *)vaddr + off, paddr + off);
off += mp->elt_size + mp->trailer_size;
i++;
}
@@ -441,6 +445,10 @@ rte_mempool_populate_phys_tab(struct rte_mempool *mp, char 
*vaddr,
if (mp->nb_mem_chunks != 0)
return -EEXIST;

+   if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
+   return rte_mempool_populate_phys(mp, vaddr, RTE_BAD_PHYS_ADDR,
+   pg_num * pg_sz, free_cb, opaque);
+
for (i = 0; i < pg_num && mp->populated_size < mp->size; i += n) {

/* populate with the largest group of contiguous pages */
@@ -481,6 +489,10 @@ rte_mempool_populate_virt(struct rte_mempool *mp, char 
*addr,
if (RTE_ALIGN_CEIL(len, pg_sz) != len)
return -EINVAL;

+   if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
+   return rte_mempool_populate_phys(mp, addr, RTE_BAD_PHYS_ADDR,
+   len, free_cb, opaque);
+
for (off = 0; off + pg_sz <= len &&
 mp->populated_size < mp->size; off += phys_len) {

@@ -530,6 +542,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
char mz_name[RTE_MEMZONE_NAMESIZE];
const struct rte_memzone *mz;
size_t size, total_elt_sz, align, pg_sz, pg_shift;
+   phys_addr_t paddr;
unsigned mz_id, n;
int ret;

@@ -569,10 +582,14 @@ rte_mempool_populate_default(struct rte_mempool *mp)
goto fail;
}

-   /* use memzone physical address if it is valid */
+   if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
+   paddr = RTE_BAD_PHYS_ADDR;
+   else
+   paddr = mz->phys_addr;
+
if (rte_eal_has_hugepages() && !rte_xen_dom0_supported())
ret = rte_mempool_populate_phys(mp, mz->addr,
-   mz->phys_addr, mz->len,
+   paddr, mz->len,
rte_mempool_memchunk_mz_free,
RTE_DECONST(void *, mz));
else
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7a3e652..7599790 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -235,6 +235,7 @@ struct rte_mempool {
 #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is 
"single-producer".*/
 #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
"single-consumer".*/
 #define MEMPOOL_F_RING_CREATED   0x0010 /**< Internal: ring is created */
+#define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous 
objs. */

 /**
  * @internal When debug is enabled, store some statistics.
@@ -416,6 +417,8 @@ typedef void (rte_mempool_ctor_t)(struct rte_mempool *, 
void *);
  *   - MEMPOOL_F_SC_GET: If this flag is set, the default behavior
  * when using rte_mempool_get() or rte_mempool_get_bulk() is
  * "single-consumer". Otherwise, it is "multi-consumers".
+ *   - MEMPOOL_F_NO_PHYS_CONTIG: If set, allocated objects won't
+ * necessarilly be contiguous in physical memory.
  * @return
  

[dpdk-dev] [RFC 33/35] mem: avoid memzone/mempool/ring name truncation

2016-03-09 Thread Olivier Matz
Check the return value of snprintf to ensure that the name of
the object is not truncated.

By the way, update the test to avoid to trigger an error in
that case.

Signed-off-by: Olivier Matz 
---
 app/test/test_mempool.c| 12 
 lib/librte_eal/common/eal_common_memzone.c | 10 +-
 lib/librte_mempool/rte_mempool.c   | 20 
 lib/librte_ring/rte_ring.c | 16 +---
 4 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 80d95d5..93098b3 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -407,21 +407,25 @@ test_mempool_same_name_twice_creation(void)
 {
struct rte_mempool *mp_tc;

-   mp_tc = rte_mempool_create("test_mempool_same_name_twice_creation", 
MEMPOOL_SIZE,
+   mp_tc = rte_mempool_create("test_mempool_same_name", MEMPOOL_SIZE,
MEMPOOL_ELT_SIZE, 0, 0,
NULL, NULL,
NULL, NULL,
SOCKET_ID_ANY, 0);
-   if (NULL == mp_tc)
+   if (NULL == mp_tc) {
+   printf("cannot create mempool\n");
return -1;
+   }

-   mp_tc = rte_mempool_create("test_mempool_same_name_twice_creation", 
MEMPOOL_SIZE,
+   mp_tc = rte_mempool_create("test_mempool_same_name", MEMPOOL_SIZE,
MEMPOOL_ELT_SIZE, 0, 0,
NULL, NULL,
NULL, NULL,
SOCKET_ID_ANY, 0);
-   if (NULL != mp_tc)
+   if (NULL != mp_tc) {
+   printf("should not be able to create mempool\n");
return -1;
+   }

return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memzone.c 
b/lib/librte_eal/common/eal_common_memzone.c
index 711c845..774eb5d 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -126,6 +126,7 @@ static const struct rte_memzone *
 memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
int socket_id, unsigned flags, unsigned align, unsigned bound)
 {
+   struct rte_memzone *mz;
struct rte_mem_config *mcfg;
size_t requested_len;
int socket, i;
@@ -148,6 +149,13 @@ memzone_reserve_aligned_thread_unsafe(const char *name, 
size_t len,
return NULL;
}

+   if (strlen(name) >= sizeof(mz->name) - 1) {
+   RTE_LOG(DEBUG, EAL, "%s(): memzone <%s>: name too long\n",
+   __func__, name);
+   rte_errno = EEXIST;
+   return NULL;
+   }
+
/* if alignment is not a power of two */
if (align && !rte_is_power_of_2(align)) {
RTE_LOG(ERR, EAL, "%s(): Invalid alignment: %u\n", __func__,
@@ -223,7 +231,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, 
size_t len,
const struct malloc_elem *elem = malloc_elem_from_data(mz_addr);

/* fill the zone in config */
-   struct rte_memzone *mz = get_next_free_memzone();
+   mz = get_next_free_memzone();

if (mz == NULL) {
RTE_LOG(ERR, EAL, "%s(): Cannot find free memzone but there is 
room "
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2a7d6cd..397e6ec 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -305,11 +305,14 @@ rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t 
elt_num,
 static int
 rte_mempool_ring_create(struct rte_mempool *mp)
 {
-   int rg_flags = 0;
+   int rg_flags = 0, ret;
char rg_name[RTE_RING_NAMESIZE];
struct rte_ring *r;

-   snprintf(rg_name, sizeof(rg_name), RTE_MEMPOOL_MZ_FORMAT, mp->name);
+   ret = snprintf(rg_name, sizeof(rg_name),
+   RTE_MEMPOOL_MZ_FORMAT, mp->name);
+   if (ret < 0 || ret >= (int)sizeof(rg_name))
+   return -ENAMETOOLONG;

/* ring flags */
if (mp->flags & MEMPOOL_F_SP_PUT)
@@ -688,6 +691,7 @@ rte_mempool_create_empty(const char *name, unsigned n, 
unsigned elt_size,
size_t mempool_size;
int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
struct rte_mempool_objsz objsz;
+   int ret;

/* compilation-time checks */
RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
@@ -741,7 +745,11 @@ rte_mempool_create_empty(const char *name, unsigned n, 
unsigned elt_size,
mempool_size += private_data_size;
mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);

-   snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_MZ_FORMAT, name);
+   ret = snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_MZ_FORMAT, name);
+   if (ret < 0 || ret 

[dpdk-dev] [RFC 32/35] mempool: make mempool populate and free api public

2016-03-09 Thread Olivier Matz
Add the following functions to the public mempool API:

- rte_mempool_create_empty()
- rte_mempool_populate_phys()
- rte_mempool_populate_phys_tab()
- rte_mempool_populate_virt()
- rte_mempool_populate_default()
- rte_mempool_populate_anon()
- rte_mempool_free()

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c   |  14 +--
 lib/librte_mempool/rte_mempool.h   | 168 +
 lib/librte_mempool/rte_mempool_version.map |   9 +-
 3 files changed, 183 insertions(+), 8 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 1f5ba50..2a7d6cd 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -367,7 +367,7 @@ rte_mempool_free_memchunks(struct rte_mempool *mp)
 /* Add objects in the pool, using a physically contiguous memory
  * zone. Return the number of objects added, or a negative value
  * on error. */
-static int
+int
 rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
phys_addr_t paddr, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
void *opaque)
@@ -425,7 +425,7 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,

 /* Add objects in the pool, using a table of physical pages. Return the
  * number of objects added, or a negative value on error. */
-static int
+int
 rte_mempool_populate_phys_tab(struct rte_mempool *mp, char *vaddr,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
rte_mempool_memchunk_free_cb_t *free_cb, void *opaque)
@@ -460,7 +460,7 @@ rte_mempool_populate_phys_tab(struct rte_mempool *mp, char 
*vaddr,

 /* Populate the mempool with a virtual area. Return the number of
  * objects added, or a negative value on error. */
-static int
+int
 rte_mempool_populate_virt(struct rte_mempool *mp, char *addr,
size_t len, size_t pg_sz, rte_mempool_memchunk_free_cb_t *free_cb,
void *opaque)
@@ -520,7 +520,7 @@ rte_mempool_populate_virt(struct rte_mempool *mp, char 
*addr,
 /* Default function to populate the mempool: allocate memory in memzones,
  * and populate them. Return the number of objects added, or a negative
  * value on error. */
-static int
+int
 rte_mempool_populate_default(struct rte_mempool *mp)
 {
int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
@@ -611,7 +611,7 @@ rte_mempool_memchunk_anon_free(struct rte_mempool_memhdr 
*memhdr,
 }

 /* populate the mempool with an anonymous mapping */
-__rte_unused static int
+int
 rte_mempool_populate_anon(struct rte_mempool *mp)
 {
size_t size;
@@ -646,7 +646,7 @@ rte_mempool_populate_anon(struct rte_mempool *mp)
 }

 /* free a mempool */
-static void
+void
 rte_mempool_free(struct rte_mempool *mp)
 {
struct rte_mempool_list *mempool_list = NULL;
@@ -675,7 +675,7 @@ rte_mempool_free(struct rte_mempool *mp)
 }

 /* create an empty mempool */
-static struct rte_mempool *
+struct rte_mempool *
 rte_mempool_create_empty(const char *name, unsigned n, unsigned elt_size,
unsigned cache_size, unsigned private_data_size,
int socket_id, unsigned flags)
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index e0549c6..7a3e652 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -501,6 +501,174 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift);

 /**
+ * Create an empty mempool
+ *
+ * The mempool is allocated and initialized, but it is not populated: no
+ * memory is allocated for the mempool elements. The user has to call
+ * rte_mempool_populate_*() or to add memory chunks to the pool. Once
+ * populated, the user may also want to initialize each object with
+ * rte_mempool_obj_iter().
+ *
+ * @param name
+ *   The name of the mempool.
+ * @param n
+ *   The maximum number of elements that can be added in the mempool.
+ *   The optimum size (in terms of memory usage) for a mempool is when n
+ *   is a power of two minus one: n = (2^q - 1).
+ * @param elt_size
+ *   The size of each element.
+ * @param cache_size
+ *   Size of the cache. See rte_mempool_create() for details.
+ * @param private_data_size
+ *   The size of the private data appended after the mempool
+ *   structure. This is useful for storing some private data after the
+ *   mempool structure, as is done for rte_mbuf_pool for example.
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in the case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   Flags controlling the behavior of the mempool. See
+ *   rte_mempool_create() for details.
+ * @return
+ *   The pointer to the new allocated mempool, on success. NULL on error
+ *   with rte_errno set appropriately. See rte_mempool_create() for details.
+ */
+struct rte_mempool *

[dpdk-dev] [RFC 31/35] test-pmd: remove specific anon mempool code

2016-03-09 Thread Olivier Matz
Now that mempool library provide functions to populate with anonymous
mmap'd memory, we can remove this specific code from test-pmd.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/Makefile|   4 -
 app/test-pmd/mempool_anon.c  | 201 ---
 app/test-pmd/mempool_osdep.h |  54 
 app/test-pmd/testpmd.c   |  17 ++--
 4 files changed, 11 insertions(+), 265 deletions(-)
 delete mode 100644 app/test-pmd/mempool_anon.c
 delete mode 100644 app/test-pmd/mempool_osdep.h

diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile
index 72426f3..40039a1 100644
--- a/app/test-pmd/Makefile
+++ b/app/test-pmd/Makefile
@@ -58,11 +58,7 @@ SRCS-y += txonly.c
 SRCS-y += csumonly.c
 SRCS-y += icmpecho.c
 SRCS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ieee1588fwd.c
-SRCS-y += mempool_anon.c

-ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
-CFLAGS_mempool_anon.o := -D_GNU_SOURCE
-endif
 CFLAGS_cmdline.o := -D_GNU_SOURCE

 # this application needs libraries first
diff --git a/app/test-pmd/mempool_anon.c b/app/test-pmd/mempool_anon.c
deleted file mode 100644
index 5e23848..000
--- a/app/test-pmd/mempool_anon.c
+++ /dev/null
@@ -1,201 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of Intel Corporation nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include 
-#include 
-#include "mempool_osdep.h"
-#include 
-
-#ifdef RTE_EXEC_ENV_LINUXAPP
-
-#include 
-#include 
-#include 
-
-
-#definePAGEMAP_FNAME   "/proc/self/pagemap"
-
-/*
- * the pfn (page frame number) are bits 0-54 (see pagemap.txt in linux
- * Documentation).
- */
-#definePAGEMAP_PFN_BITS54
-#definePAGEMAP_PFN_MASKRTE_LEN2MASK(PAGEMAP_PFN_BITS, 
phys_addr_t)
-
-
-static int
-get_phys_map(void *va, phys_addr_t pa[], uint32_t pg_num, uint32_t pg_sz)
-{
-   int32_t fd, rc;
-   uint32_t i, nb;
-   off_t ofs;
-
-   ofs = (uintptr_t)va / pg_sz * sizeof(*pa);
-   nb = pg_num * sizeof(*pa);
-
-   if ((fd = open(PAGEMAP_FNAME, O_RDONLY)) < 0)
-   return ENOENT;
-
-   if ((rc = pread(fd, pa, nb, ofs)) < 0 || (rc -= nb) != 0) {
-
-   RTE_LOG(ERR, USER1, "failed read of %u bytes from \'%s\' "
-   "at offset %zu, error code: %d\n",
-   nb, PAGEMAP_FNAME, (size_t)ofs, errno);
-   rc = ENOENT;
-   }
-
-   close(fd);
-
-   for (i = 0; i != pg_num; i++)
-   pa[i] = (pa[i] & PAGEMAP_PFN_MASK) * pg_sz;
-
-   return rc;
-}
-
-struct rte_mempool *
-mempool_anon_create(const char *name, unsigned elt_num, unsigned elt_size,
-  unsigned cache_size, unsigned private_data_size,
-  rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
-  int socket_id, unsigned flags)
-{
-   struct rte_mempool *mp;
-   phys_addr_t *pa;
-   char *va, *uv;
-   uint32_t n, pg_num, pg_shift, pg_sz, total_size;
-   size_t sz;
-   ssize_t usz;
-   int32_t rc;
-
-   rc = ENOMEM;
-   mp = NULL;
-
-   pg_sz = getpagesize();
-   if (rte_is_power_of_2(pg_sz) == 0) {
-   rte_errno = EINVAL;
-   return mp;
-   }
-
-   pg_shift = rte_bsf32(pg_sz);
-
-   total_size = rte_mempool_calc_obj_size(elt_size, flags, NULL);
-
-   

[dpdk-dev] [RFC 30/35] mempool: populate a mempool with anonymous memory

2016-03-09 Thread Olivier Matz
Now that we can populate a mempool with any virtual memory,
it is easier to introduce a function to populate a mempool
with memory coming from an anonymous mapping, as it's done
in test-pmd.

The next commit will replace test-pmd anonymous mapping by
this function.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 58 
 1 file changed, 58 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2546740..1f5ba50 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -587,6 +588,63 @@ rte_mempool_populate_default(struct rte_mempool *mp)
return ret;
 }

+/* return the memory size required for mempool objects in anonymous mem */
+static size_t
+get_anon_size(const struct rte_mempool *mp)
+{
+   size_t size, total_elt_sz, pg_sz, pg_shift;
+
+   pg_sz = getpagesize();
+   pg_shift = rte_bsf32(pg_sz);
+   total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+   size = rte_mempool_xmem_size(mp->size, total_elt_sz, pg_shift);
+
+   return size;
+}
+
+/* unmap a memory zone mapped by rte_mempool_populate_anon() */
+static void
+rte_mempool_memchunk_anon_free(struct rte_mempool_memhdr *memhdr,
+   void *opaque)
+{
+   munmap(opaque, get_anon_size(memhdr->mp));
+}
+
+/* populate the mempool with an anonymous mapping */
+__rte_unused static int
+rte_mempool_populate_anon(struct rte_mempool *mp)
+{
+   size_t size;
+   int ret;
+   char *addr;
+
+   /* mempool is already populated, error */
+   if (!STAILQ_EMPTY(>mem_list)) {
+   rte_errno = EINVAL;
+   return 0;
+   }
+
+   /* get chunk of virtually continuous memory */
+   size = get_anon_size(mp);
+   addr = mmap(NULL, size, PROT_READ | PROT_WRITE,
+   MAP_SHARED | MAP_ANONYMOUS | MAP_LOCKED, -1, 0);
+   if (addr == MAP_FAILED) {
+   rte_errno = errno;
+   return 0;
+   }
+
+   ret = rte_mempool_populate_virt(mp, addr, size, getpagesize(),
+   rte_mempool_memchunk_anon_free, addr);
+   if (ret == 0)
+   goto fail;
+
+   return mp->populated_size;
+
+ fail:
+   rte_mempool_free_memchunks(mp);
+   return 0;
+}
+
 /* free a mempool */
 static void
 rte_mempool_free(struct rte_mempool *mp)
-- 
2.1.4



[dpdk-dev] [RFC 29/35] mempool: create the internal ring when populating

2016-03-09 Thread Olivier Matz
Instead of creating the internal ring at mempool creation, do
it when populating the mempool with the first memory chunk. The
objective here is to simplify the change of external handler
when it will be introduced.

For instance, this will be possible:

  mp = rte_mempool_create_empty(...)
  rte_mempool_set_ext_handler(mp, my_handler)
  rte_mempool_populate_default()

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 12 +---
 lib/librte_mempool/rte_mempool.h |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 0f4cb4e..2546740 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -326,6 +326,7 @@ rte_mempool_ring_create(struct rte_mempool *mp)
return -rte_errno;

mp->ring = r;
+   mp->flags |= MEMPOOL_F_RING_CREATED;
return 0;
 }

@@ -374,6 +375,14 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,
unsigned i = 0;
size_t off;
struct rte_mempool_memhdr *memhdr;
+   int ret;
+
+   /* create the internal ring if not already done */
+   if ((mp->flags & MEMPOOL_F_RING_CREATED) == 0) {
+   ret = rte_mempool_ring_create(mp);
+   if (ret < 0)
+   return ret;
+   }

/* mempool is already populated */
if (mp->populated_size >= mp->size)
@@ -698,9 +707,6 @@ rte_mempool_create_empty(const char *name, unsigned n, 
unsigned elt_size,
STAILQ_INIT(>elt_list);
STAILQ_INIT(>mem_list);

-   if (rte_mempool_ring_create(mp) < 0)
-   goto exit_unlock;
-
/*
 * local_cache pointer is set even if cache_size is zero.
 * The local_cache points to just past the elt_pa[] array.
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 47743a6..e0549c6 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -234,6 +234,7 @@ struct rte_mempool {
 #define MEMPOOL_F_NO_CACHE_ALIGN 0x0002 /**< Do not align objs on cache 
lines.*/
 #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is 
"single-producer".*/
 #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
"single-consumer".*/
+#define MEMPOOL_F_RING_CREATED   0x0010 /**< Internal: ring is created */

 /**
  * @internal When debug is enabled, store some statistics.
-- 
2.1.4



[dpdk-dev] [RFC 28/35] mempool: rework support of xen dom0

2016-03-09 Thread Olivier Matz
Avoid to have a specific file for that, and remove #ifdefs.
Now that we have introduced a function to populate a mempool
with a virtual area, the support of xen dom0 is much easier.

The only thing we need to do is to convert the guest physical
address into the machine physical address using rte_mem_phy2mch().
This function does nothing when not running xen.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/Makefile|   3 -
 lib/librte_mempool/rte_dom0_mempool.c  | 133 -
 lib/librte_mempool/rte_mempool.c   |  33 ++-
 lib/librte_mempool/rte_mempool.h   |  89 ---
 lib/librte_mempool/rte_mempool_version.map |   1 -
 5 files changed, 5 insertions(+), 254 deletions(-)
 delete mode 100644 lib/librte_mempool/rte_dom0_mempool.c

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 706f844..43423e0 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -42,9 +42,6 @@ LIBABIVER := 2

 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
-ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y)
-SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_dom0_mempool.c
-endif
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

diff --git a/lib/librte_mempool/rte_dom0_mempool.c 
b/lib/librte_mempool/rte_dom0_mempool.c
deleted file mode 100644
index dad755c..000
--- a/lib/librte_mempool/rte_dom0_mempool.c
+++ /dev/null
@@ -1,133 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of Intel Corporation nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "rte_mempool.h"
-
-static void
-get_phys_map(void *va, phys_addr_t pa[], uint32_t pg_num,
-   uint32_t pg_sz, uint32_t memseg_id)
-{
-   uint32_t i;
-   uint64_t virt_addr, mfn_id;
-   struct rte_mem_config *mcfg;
-   uint32_t page_size = getpagesize();
-
-   /* get pointer to global configuration */
-   mcfg = rte_eal_get_configuration()->mem_config;
-   virt_addr = (uintptr_t) mcfg->memseg[memseg_id].addr;
-
-   for (i = 0; i != pg_num; i++) {
-   mfn_id = ((uintptr_t)va + i * pg_sz - virt_addr) / 
RTE_PGSIZE_2M;
-   pa[i] = mcfg->memseg[memseg_id].mfn[mfn_id] * page_size;
-   }
-}
-
-/* create the mempool for supporting Dom0 */
-struct rte_mempool *
-rte_dom0_mempool_create(const char *name, unsigned elt_num, unsigned elt_size,
-   unsigned cache_size, unsigned private_data_size,
-   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
-   int socket_id, unsigned flags)
-{
-   struct rte_mempool *mp = NULL;
-   phys_addr_t *pa;
-   char *va;
-   size_t sz;
-   uint32_t pg_num, pg_shift, pg_sz, total_size;
-   const struct rte_memzone *mz;
-   char mz_name[RTE_MEMZONE_NAMESIZE];
-   int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
-
-   pg_sz = RTE_PGSIZE_2M;
-
-   pg_shift = rte_bsf32(pg_sz);
-   total_size = rte_mempool_calc_obj_size(elt_size, flags, NULL);
-
-   /* 

[dpdk-dev] [RFC 27/35] eal/xen: return machine address without knowing memseg id

2016-03-09 Thread Olivier Matz
The conversion from guest physical address to machine physical address
is fast when the caller knows the memseg corresponding to the gpa.

But in case the user does not know this information, just find it
by browsing the segments. This feature will be used by next commit.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/common/include/rte_memory.h   | 11 ++-
 lib/librte_eal/linuxapp/eal/eal_xen_memory.c | 17 +++--
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_memory.h 
b/lib/librte_eal/common/include/rte_memory.h
index f8dbece..0661109 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -200,21 +200,22 @@ unsigned rte_memory_get_nrank(void);
 int rte_xen_dom0_supported(void);

 /**< Internal use only - phys to virt mapping for xen */
-phys_addr_t rte_xen_mem_phy2mch(uint32_t, const phys_addr_t);
+phys_addr_t rte_xen_mem_phy2mch(int32_t, const phys_addr_t);

 /**
  * Return the physical address of elt, which is an element of the pool mp.
  *
  * @param memseg_id
- *   The mempool is from which memory segment.
+ *   Identifier of the memory segment owning the physical address. If
+ *   set to -1, find it automatically.
  * @param phy_addr
  *   physical address of elt.
  *
  * @return
- *   The physical address or error.
+ *   The physical address or RTE_BAD_PHYS_ADDR on error.
  */
 static inline phys_addr_t
-rte_mem_phy2mch(uint32_t memseg_id, const phys_addr_t phy_addr)
+rte_mem_phy2mch(int32_t memseg_id, const phys_addr_t phy_addr)
 {
if (rte_xen_dom0_supported())
return rte_xen_mem_phy2mch(memseg_id, phy_addr);
@@ -250,7 +251,7 @@ static inline int rte_xen_dom0_supported(void)
 }

 static inline phys_addr_t
-rte_mem_phy2mch(uint32_t memseg_id __rte_unused, const phys_addr_t phy_addr)
+rte_mem_phy2mch(int32_t memseg_id __rte_unused, const phys_addr_t phy_addr)
 {
return phy_addr;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_xen_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_xen_memory.c
index 495eef9..efbd374 100644
--- a/lib/librte_eal/linuxapp/eal/eal_xen_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_xen_memory.c
@@ -156,13 +156,26 @@ get_xen_memory_size(void)
  * Based on physical address to caculate MFN in Xen Dom0.
  */
 phys_addr_t
-rte_xen_mem_phy2mch(uint32_t memseg_id, const phys_addr_t phy_addr)
+rte_xen_mem_phy2mch(int32_t memseg_id, const phys_addr_t phy_addr)
 {
-   int mfn_id;
+   int mfn_id, i;
uint64_t mfn, mfn_offset;
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg *memseg = mcfg->memseg;

+   /* find the memory segment owning the physical address */
+   if (memseg_id == -1) {
+   for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+   if ((phy_addr >= memseg[i].phys_addr) &&
+   (phys_addr < memseg[i].phys_addr + 
memseg[i].size)) {
+   memseg_id = i;
+   break;
+   }
+   }
+   if (memseg_id == -1)
+   return RTE_BAD_PHYS_ADDR;
+   }
+
mfn_id = (phy_addr - memseg[memseg_id].phys_addr) / RTE_PGSIZE_2M;

/*the MFN is contiguous in 2M */
-- 
2.1.4



[dpdk-dev] [RFC 26/35] mempool: introduce a function to create an empty mempool

2016-03-09 Thread Olivier Matz
Introduce a new function rte_mempool_create_empty()
that allocates a mempool that is not populated.

The functions rte_mempool_create() and rte_mempool_xmem_create()
now make use of it, making their code much easier to read.
Currently, they are the only users of rte_mempool_create_empty()
but the function will be made public in next commits.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 185 ++-
 1 file changed, 107 insertions(+), 78 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 4b74ffd..afb2992 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -320,30 +320,6 @@ rte_dom0_mempool_create(const char *name __rte_unused,
 }
 #endif

-/* create the mempool */
-struct rte_mempool *
-rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
-  unsigned cache_size, unsigned private_data_size,
-  rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
-  int socket_id, unsigned flags)
-{
-   if (rte_xen_dom0_supported())
-   return rte_dom0_mempool_create(name, n, elt_size,
-  cache_size, private_data_size,
-  mp_init, mp_init_arg,
-  obj_init, obj_init_arg,
-  socket_id, flags);
-   else
-   return rte_mempool_xmem_create(name, n, elt_size,
-  cache_size, private_data_size,
-  mp_init, mp_init_arg,
-  obj_init, obj_init_arg,
-  socket_id, flags,
-  NULL, NULL, 
MEMPOOL_PG_NUM_DEFAULT,
-  MEMPOOL_PG_SHIFT_MAX);
-}
-
 /* create the internal ring */
 static int
 rte_mempool_ring_create(struct rte_mempool *mp)
@@ -647,20 +623,11 @@ rte_mempool_free(struct rte_mempool *mp)
rte_memzone_free(mp->mz);
 }

-/*
- * Create the mempool over already allocated chunk of memory.
- * That external memory buffer can consists of physically disjoint pages.
- * Setting vaddr to NULL, makes mempool to fallback to original behaviour
- * and allocate space for mempool and it's elements as one big chunk of
- * physically continuos memory.
- * */
-struct rte_mempool *
-rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
-   unsigned cache_size, unsigned private_data_size,
-   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
-   int socket_id, unsigned flags, void *vaddr,
-   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
+/* create an empty mempool */
+static struct rte_mempool *
+rte_mempool_create_empty(const char *name, unsigned n, unsigned elt_size,
+   unsigned cache_size, unsigned private_data_size,
+   int socket_id, unsigned flags)
 {
char mz_name[RTE_MEMZONE_NAMESIZE];
struct rte_mempool_list *mempool_list;
@@ -670,7 +637,6 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
size_t mempool_size;
int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
struct rte_mempool_objsz objsz;
-   int ret;

/* compilation-time checks */
RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
@@ -693,18 +659,6 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
return NULL;
}

-   /* check that we have both VA and PA */
-   if (vaddr != NULL && paddr == NULL) {
-   rte_errno = EINVAL;
-   return NULL;
-   }
-
-   /* Check that pg_num and pg_shift parameters are valid. */
-   if (pg_num == 0 || pg_shift > MEMPOOL_PG_SHIFT_MAX) {
-   rte_errno = EINVAL;
-   return NULL;
-   }
-
/* "no cache align" imply "no spread" */
if (flags & MEMPOOL_F_NO_CACHE_ALIGN)
flags |= MEMPOOL_F_NO_SPREAD;
@@ -732,11 +686,6 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
goto exit_unlock;
}

-   /*
-* If user provided an external memory buffer, then use it to
-* store mempool objects. Otherwise reserve a memzone that is large
-* enough to hold mempool header and metadata plus mempool objects.
-*/
mempool_size = MEMPOOL_HEADER_SIZE(mp, cache_size);
mempool_size += private_data_size;
mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
@@ -748,12 +697,14 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned 

[dpdk-dev] [RFC 25/35] mempool: introduce a function to free a mempool

2016-03-09 Thread Olivier Matz
Introduce rte_mempool_free() that:

- unlink the mempool from the global list if it is found
- free all the memory chunks using their free callbacks
- free the internal ring
- free the memzone containing the mempool

Currently this function is only used in error cases when
creating a new mempool, but it will be made public later
in the patch series.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 9e2b72b..4b74ffd 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -618,6 +618,35 @@ rte_mempool_populate_default(struct rte_mempool *mp)
return ret;
 }

+/* free a mempool */
+static void
+rte_mempool_free(struct rte_mempool *mp)
+{
+   struct rte_mempool_list *mempool_list = NULL;
+   struct rte_tailq_entry *te;
+
+   if (mp == NULL)
+   return;
+
+   mempool_list = RTE_TAILQ_CAST(rte_mempool_tailq.head, rte_mempool_list);
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+   /* find out tailq entry */
+   TAILQ_FOREACH(te, mempool_list, next) {
+   if (te->data == (void *)mp)
+   break;
+   }
+
+   if (te != NULL) {
+   TAILQ_REMOVE(mempool_list, te, next);
+   rte_free(te);
+   }
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+   rte_mempool_free_memchunks(mp);
+   rte_ring_free(mp->ring);
+   rte_memzone_free(mp->mz);
+}
+
 /*
  * Create the mempool over already allocated chunk of memory.
  * That external memory buffer can consists of physically disjoint pages.
@@ -777,12 +806,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,

 exit_unlock:
rte_rwlock_write_unlock(RTE_EAL_MEMPOOL_RWLOCK);
-   if (mp != NULL) {
-   rte_mempool_free_memchunks(mp);
-   rte_ring_free(mp->ring);
-   }
-   rte_free(te);
-   rte_memzone_free(mz);
+   rte_mempool_free(mp);

return NULL;
 }
-- 
2.1.4



[dpdk-dev] [RFC 24/35] mempool: replace mempool physaddr by a memzone pointer

2016-03-09 Thread Olivier Matz
Storing the pointer to the memzone instead of the physical address
provides more information than just the physical address: for instance,
the memzone flags.

Moreover, keeping the memzone pointer will allow us to free the mempool
(this is done later in the series).

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 4 ++--
 lib/librte_mempool/rte_mempool.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7ec6709..9e2b72b 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -721,7 +721,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
/* init the mempool structure */
memset(mp, 0, sizeof(*mp));
snprintf(mp->name, sizeof(mp->name), "%s", name);
-   mp->phys_addr = mz->phys_addr;
+   mp->mz = mz;
mp->socket_id = socket_id;
mp->size = n;
mp->flags = flags;
@@ -985,7 +985,7 @@ rte_mempool_dump(FILE *f, const struct rte_mempool *mp)
fprintf(f, "mempool <%s>@%p\n", mp->name, mp);
fprintf(f, "  flags=%x\n", mp->flags);
fprintf(f, "  ring=<%s>@%p\n", mp->ring->name, mp->ring);
-   fprintf(f, "  phys_addr=0x%" PRIx64 "\n", mp->phys_addr);
+   fprintf(f, "  phys_addr=0x%" PRIx64 "\n", mp->mz->phys_addr);
fprintf(f, "  nb_mem_chunks=%u\n", mp->nb_mem_chunks);
fprintf(f, "  size=%"PRIu32"\n", mp->size);
fprintf(f, "  populated_size=%"PRIu32"\n", mp->populated_size);
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7222c14..05241e1 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -203,7 +203,7 @@ struct rte_mempool_memhdr {
 struct rte_mempool {
char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
struct rte_ring *ring;   /**< Ring to store objects. */
-   phys_addr_t phys_addr;   /**< Phys. addr. of mempool struct. */
+   const struct rte_memzone *mz;/**< Memzone where mempool is 
allocated */
int flags;   /**< Flags of the mempool. */
int socket_id;   /**< Socket id passed at mempool 
creation. */
uint32_t size;   /**< Max size of the mempool. */
-- 
2.1.4



[dpdk-dev] [RFC 23/35] mempool: support no-hugepage mode

2016-03-09 Thread Olivier Matz
Introduce a new function rte_mempool_populate_virt() that is now called
by default when hugepages are not supported. This function populate the
mempool with several physically contiguous chunks whose minimum size is
the page size of the system.

Thanks to this, rte_mempool_create() will work properly in without
hugepages (if the object size is smaller than a page size), and 2
specific workarouds can be removed:

- trailer_size was artificially extended to a page size
- rte_mempool_virt2phy() did not rely on object physical address

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 106 ++-
 lib/librte_mempool/rte_mempool.h |  19 ++-
 2 files changed, 86 insertions(+), 39 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7fd2bb4..7ec6709 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -224,23 +224,6 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
flags,
sz->trailer_size = new_size - sz->header_size - sz->elt_size;
}

-   if (! rte_eal_has_hugepages()) {
-   /*
-* compute trailer size so that pool elements fit exactly in
-* a standard page
-*/
-   int page_size = getpagesize();
-   int new_size = page_size - sz->header_size - sz->elt_size;
-   if (new_size < 0 || (unsigned int)new_size < sz->trailer_size) {
-   printf("When hugepages are disabled, pool objects "
-  "can't exceed PAGE_SIZE: %d + %d + %d > %d\n",
-  sz->header_size, sz->elt_size, sz->trailer_size,
-  page_size);
-   return 0;
-   }
-   sz->trailer_size = new_size;
-   }
-
/* this is the size of an object, including header and trailer */
sz->total_size = sz->header_size + sz->elt_size + sz->trailer_size;

@@ -509,15 +492,72 @@ rte_mempool_populate_phys_tab(struct rte_mempool *mp, 
char *vaddr,
return cnt;
 }

-/* Default function to populate the mempool: allocate memory in mezones,
+/* Populate the mempool with a virtual area. Return the number of
+ * objects added, or a negative value on error. */
+static int
+rte_mempool_populate_virt(struct rte_mempool *mp, char *addr,
+   size_t len, size_t pg_sz, rte_mempool_memchunk_free_cb_t *free_cb,
+   void *opaque)
+{
+   phys_addr_t paddr;
+   size_t off, phys_len;
+   int ret, cnt = 0;
+
+   /* mempool must not be populated */
+   if (mp->nb_mem_chunks != 0)
+   return -EEXIST;
+   /* address and len must be page-aligned */
+   if (RTE_PTR_ALIGN_CEIL(addr, pg_sz) != addr)
+   return -EINVAL;
+   if (RTE_ALIGN_CEIL(len, pg_sz) != len)
+   return -EINVAL;
+
+   for (off = 0; off + pg_sz <= len &&
+mp->populated_size < mp->size; off += phys_len) {
+
+   paddr = rte_mem_virt2phy(addr + off);
+   if (paddr == RTE_BAD_PHYS_ADDR) {
+   ret = -EINVAL;
+   goto fail;
+   }
+
+   /* populate with the largest group of contiguous pages */
+   for (phys_len = pg_sz; off + phys_len < len; phys_len += pg_sz) 
{
+   phys_addr_t paddr_tmp;
+
+   paddr_tmp = rte_mem_virt2phy(addr + off + phys_len);
+   paddr_tmp = rte_mem_phy2mch(-1, paddr_tmp);
+
+   if (paddr_tmp != paddr + phys_len)
+   break;
+   }
+
+   ret = rte_mempool_populate_phys(mp, addr + off, paddr,
+   phys_len, free_cb, opaque);
+   if (ret < 0)
+   goto fail;
+   /* no need to call the free callback for next chunks */
+   free_cb = NULL;
+   cnt += ret;
+   }
+
+   return cnt;
+
+ fail:
+   rte_mempool_free_memchunks(mp);
+   return ret;
+}
+
+/* Default function to populate the mempool: allocate memory in memzones,
  * and populate them. Return the number of objects added, or a negative
  * value on error. */
-static int rte_mempool_populate_default(struct rte_mempool *mp)
+static int
+rte_mempool_populate_default(struct rte_mempool *mp)
 {
int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
char mz_name[RTE_MEMZONE_NAMESIZE];
const struct rte_memzone *mz;
-   size_t size, total_elt_sz, align;
+   size_t size, total_elt_sz, align, pg_sz, pg_shift;
unsigned mz_id, n;
int ret;

@@ -525,10 +565,19 @@ static int rte_mempool_populate_default(struct 
rte_mempool *mp)
if (mp->nb_mem_chunks != 0)
return -EEXIST;

-   align = RTE_CACHE_LINE_SIZE;
+   if (rte_eal_has_hugepages()) {
+ 

[dpdk-dev] [RFC 22/35] eal: lock memory when using no-huge

2016-03-09 Thread Olivier Matz
Although the physical address won't be correct in memory segment,
this allows at least to retrieve the physical address using
rte_mem_virt2phy(). Indeed, if the page is not locked, the page
may not be present in physical memory.

With next commit, it allows a mempool to have properly filled physical
addresses when using --no-huge option.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 6008533..c2a5799 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1105,7 +1105,7 @@ rte_eal_hugepage_init(void)
/* hugetlbfs can be disabled */
if (internal_config.no_hugetlbfs) {
addr = mmap(NULL, internal_config.memory, PROT_READ | 
PROT_WRITE,
-   MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+   MAP_LOCKED | MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
if (addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
strerror(errno));
-- 
2.1.4



[dpdk-dev] [RFC 21/35] mempool: default allocation in several memory chunks

2016-03-09 Thread Olivier Matz
Introduce rte_mempool_populate_default() which allocates
mempool objects in several memzones.

The mempool header is now always allocated in a specific memzone
(not with its objects). Thanks to this modification, we can remove
many specific behavior that was required when hugepages are not
enabled in case we are using rte_mempool_xmem_create().

This change requires to update how kni and mellanox drivers lookup for
mbuf memory. This will only work if there is only one memory chunk (like
today), but we could make use of rte_mempool_mem_iter() to support more
memory chunks.

We can also remove RTE_MEMPOOL_OBJ_NAME that is not required anymore for
the lookup, as memory chunks are referenced by the mempool.

Note that rte_mempool_create() is still broken (it was the case before)
when there is no hugepages support (rte_mempool_create_xmem() has to be
used). This is fixed in next commit.

Signed-off-by: Olivier Matz 
---
 drivers/net/mlx4/mlx4.c   |  18 --
 drivers/net/mlx5/mlx5_rxq.c   |   9 ++-
 drivers/net/mlx5/mlx5_rxtx.c  |   9 ++-
 lib/librte_kni/rte_kni.c  |  12 +++-
 lib/librte_mempool/rte_dom0_mempool.c |   2 +-
 lib/librte_mempool/rte_mempool.c  | 116 +++---
 lib/librte_mempool/rte_mempool.h  |  11 
 7 files changed, 102 insertions(+), 75 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index d9b2291..405324c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1237,9 +1237,14 @@ txq_mp2mr(struct txq *txq, const struct rte_mempool *mp)
/* Add a new entry, register MR first. */
DEBUG("%p: discovered new memory pool \"%s\" (%p)",
  (void *)txq, mp->name, (const void *)mp);
+   if (mp->nb_mem_chunks != 1) {
+   DEBUG("%p: only 1 memory chunk is supported in mempool",
+   (void *)txq);
+   return (uint32_t)-1;
+   }
mr = ibv_reg_mr(txq->priv->pd,
-   (void *)mp->elt_va_start,
-   (mp->elt_va_end - mp->elt_va_start),
+   (void *)STAILQ_FIRST(>mem_list)->addr,
+   STAILQ_FIRST(>mem_list)->len,
(IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE));
if (unlikely(mr == NULL)) {
DEBUG("%p: unable to configure MR, ibv_reg_mr() failed.",
@@ -3675,6 +3680,11 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
  " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
return EINVAL;
}
+   if (mp->nb_mem_chunks != 1) {
+   ERROR("%p: only 1 memory chunk is supported in mempool",
+   (void *)dev);
+   return EINVAL;
+   }
/* Get mbuf length. */
buf = rte_pktmbuf_alloc(mp);
if (buf == NULL) {
@@ -3702,8 +3712,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
  (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc);
/* Use the entire RX mempool as the memory region. */
tmpl.mr = ibv_reg_mr(priv->pd,
-(void *)mp->elt_va_start,
-(mp->elt_va_end - mp->elt_va_start),
+(void *)STAILQ_FIRST(>mem_list)->addr,
+STAILQ_FIRST(>mem_list)->len,
 (IBV_ACCESS_LOCAL_WRITE |
  IBV_ACCESS_REMOTE_WRITE));
if (tmpl.mr == NULL) {
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index ebbe186..1513b37 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1113,6 +1113,11 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
  " multiple of %d)", (void *)dev, MLX5_PMD_SGE_WR_N);
return EINVAL;
}
+   if (mp->nb_mem_chunks != 1) {
+   ERROR("%p: only 1 memory chunk is supported in mempool",
+   (void *)dev);
+   return EINVAL;
+   }
/* Get mbuf length. */
buf = rte_pktmbuf_alloc(mp);
if (buf == NULL) {
@@ -1140,8 +1145,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
  (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc);
/* Use the entire RX mempool as the memory region. */
tmpl.mr = ibv_reg_mr(priv->pd,
-(void *)mp->elt_va_start,
-(mp->elt_va_end - mp->elt_va_start),
+(void *)STAILQ_FIRST(>mem_list)->addr,
+STAILQ_FIRST(>mem_list)->len,
 (IBV_ACCESS_LOCAL_WRITE |
  IBV_ACCESS_REMOTE_WRITE));
if (tmpl.mr == NULL) {
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index f002ca2..4ff88fc 100644
--- 

[dpdk-dev] [RFC 20/35] mempool: make page size optional when getting xmem size

2016-03-09 Thread Olivier Matz
Update rte_mempool_xmem_size() so that when the page_shift argument is
set to 0, assume that memory is physically contiguous, allowing to
ignore page boundaries. This will be used in the next commits.

By the way, rename the variable 'n' as 'obj_per_page' and avoid the
affectation inside the if().

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 18 +-
 lib/librte_mempool/rte_mempool.h |  2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 5bfe4cb..805ac19 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -254,18 +254,18 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
flags,
 size_t
 rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift)
 {
-   size_t n, pg_num, pg_sz, sz;
+   size_t obj_per_page, pg_num, pg_sz;

-   pg_sz = (size_t)1 << pg_shift;
+   if (pg_shift == 0)
+   return total_elt_sz * elt_num;

-   if ((n = pg_sz / total_elt_sz) > 0) {
-   pg_num = (elt_num + n - 1) / n;
-   sz = pg_num << pg_shift;
-   } else {
-   sz = RTE_ALIGN_CEIL(total_elt_sz, pg_sz) * elt_num;
-   }
+   pg_sz = (size_t)1 << pg_shift;
+   obj_per_page = pg_sz / total_elt_sz;
+   if (obj_per_page == 0)
+   return RTE_ALIGN_CEIL(total_elt_sz, pg_sz) * elt_num;

-   return sz;
+   pg_num = (elt_num + obj_per_page - 1) / obj_per_page;
+   return pg_num << pg_shift;
 }

 /*
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index dacdf6c..2cce7ee 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -1257,7 +1257,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, 
uint32_t flags,
  *   The size of each element, including header and trailer, as returned
  *   by rte_mempool_calc_obj_size().
  * @param pg_shift
- *   LOG2 of the physical pages size.
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
  * @return
  *   Required memory size aligned at page boundary.
  */
-- 
2.1.4



[dpdk-dev] [RFC 18/35] mempool: simplify xmem_usage

2016-03-09 Thread Olivier Matz
Since previous commit, the function rte_mempool_xmem_usage() is
now the last user of rte_mempool_obj_mem_iter(). This complex
code can now be moved inside the function. We can get rid of the
callback and do some simplification to make the code more readable.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 138 +++
 1 file changed, 37 insertions(+), 101 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 0220fa3..905387f 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -126,15 +126,6 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
 }

-/**
- * A mempool object iterator callback function.
- */
-typedef void (*rte_mempool_obj_iter_t)(void * /*obj_iter_arg*/,
-   void * /*obj_start*/,
-   void * /*obj_end*/,
-   uint32_t /*obj_index */,
-   phys_addr_t /*physaddr*/);
-
 static void
 mempool_add_elem(struct rte_mempool *mp, void *obj, phys_addr_t physaddr)
 {
@@ -158,74 +149,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
phys_addr_t physaddr)
rte_ring_sp_enqueue(mp->ring, obj);
 }

-/* Iterate through objects at the given address
- *
- * Given the pointer to the memory, and its topology in physical memory
- * (the physical addresses table), iterate through the "elt_num" objects
- * of size "elt_sz" aligned at "align". For each object in this memory
- * chunk, invoke a callback. It returns the effective number of objects
- * in this memory. */
-static uint32_t
-rte_mempool_obj_mem_iter(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
-   size_t align, const phys_addr_t paddr[], uint32_t pg_num,
-   uint32_t pg_shift, rte_mempool_obj_iter_t obj_iter, void *obj_iter_arg)
-{
-   uint32_t i, j, k;
-   uint32_t pgn, pgf;
-   uintptr_t end, start, va;
-   uintptr_t pg_sz;
-   phys_addr_t physaddr;
-
-   pg_sz = (uintptr_t)1 << pg_shift;
-   va = (uintptr_t)vaddr;
-
-   i = 0;
-   j = 0;
-
-   while (i != elt_num && j != pg_num) {
-
-   start = RTE_ALIGN_CEIL(va, align);
-   end = start + total_elt_sz;
-
-   /* index of the first page for the next element. */
-   pgf = (end >> pg_shift) - (start >> pg_shift);
-
-   /* index of the last page for the current element. */
-   pgn = ((end - 1) >> pg_shift) - (start >> pg_shift);
-   pgn += j;
-
-   /* do we have enough space left for the element. */
-   if (pgn >= pg_num)
-   break;
-
-   for (k = j;
-   k != pgn &&
-   paddr[k] + pg_sz == paddr[k + 1];
-   k++)
-   ;
-
-   /*
-* if next pgn chunks of memory physically continuous,
-* use it to create next element.
-* otherwise, just skip that chunk unused.
-*/
-   if (k == pgn) {
-   physaddr = paddr[k] + (start & (pg_sz - 1));
-   if (obj_iter != NULL)
-   obj_iter(obj_iter_arg, (void *)start,
-   (void *)end, i, physaddr);
-   va = end;
-   j += pgf;
-   i++;
-   } else {
-   va = RTE_ALIGN_CEIL((va + 1), pg_sz);
-   j++;
-   }
-   }
-
-   return i;
-}
-
 /* call obj_cb() for each mempool element */
 uint32_t
 rte_mempool_obj_iter(struct rte_mempool *mp,
@@ -345,40 +268,53 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t 
total_elt_sz, uint32_t pg_shift)
return sz;
 }

-/* Callback used by rte_mempool_xmem_usage(): it sets the opaque
- * argument to the end of the object. */
-static void
-mempool_lelem_iter(void *arg, __rte_unused void *start, void *end,
-   __rte_unused uint32_t idx, __rte_unused phys_addr_t physaddr)
-{
-   *(uintptr_t *)arg = (uintptr_t)end;
-}
-
 /*
  * Calculate how much memory would be actually required with the
  * given memory footprint to store required number of elements.
  */
 ssize_t
-rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
-   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
+rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t elt_num,
+   size_t total_elt_sz, const phys_addr_t paddr[], uint32_t pg_num,
+   uint32_t pg_shift)
 {
-   uint32_t n;
-   uintptr_t va, uv;
-   size_t pg_sz, usz;
+   uint32_t elt_cnt = 0;
+   phys_addr_t start, end;
+   uint32_t paddr_idx;
+   size_t pg_sz = (size_t)1 << pg_shift;

-   pg_sz = (size_t)1 << pg_shift;
-   va = (uintptr_t)vaddr;
-   uv = va;
+   /* if paddr is NULL, assume contiguous 

[dpdk-dev] [RFC 17/35] mempool: new function to iterate the memory chunks

2016-03-09 Thread Olivier Matz
In the same model than rte_mempool_obj_iter(), introduce
rte_mempool_mem_iter() to iterate the memory chunks attached
to the mempool.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c   | 18 ++
 lib/librte_mempool/rte_mempool.h   | 26 ++
 lib/librte_mempool/rte_mempool_version.map |  1 +
 3 files changed, 45 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index ff84f81..0220fa3 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -244,6 +244,24 @@ rte_mempool_obj_iter(struct rte_mempool *mp,
return n;
 }

+/* call mem_cb() for each mempool memory chunk */
+uint32_t
+rte_mempool_mem_iter(struct rte_mempool *mp,
+   rte_mempool_mem_cb_t *mem_cb, void *mem_cb_arg)
+{
+   struct rte_mempool_memhdr *hdr;
+   void *mem;
+   unsigned n = 0;
+
+   STAILQ_FOREACH(hdr, >mem_list, next) {
+   mem = (char *)hdr + sizeof(*hdr);
+   mem_cb(mp, mem_cb_arg, mem, n);
+   n++;
+   }
+
+   return n;
+}
+
 /* get the header, trailer and total size of a mempool element. */
 uint32_t
 rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 08bfe05..184d40d 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -332,6 +332,14 @@ typedef void (rte_mempool_obj_cb_t)(struct rte_mempool *mp,
 typedef rte_mempool_obj_cb_t rte_mempool_obj_ctor_t; /* compat */

 /**
+ * A memory callback function for mempool.
+ *
+ * Used by rte_mempool_mem_iter().
+ */
+typedef void (rte_mempool_mem_cb_t)(struct rte_mempool *mp,
+   void *opaque, void *mem, unsigned mem_idx);
+
+/**
  * A mempool constructor callback function.
  *
  * Arguments are the mempool and the opaque pointer given by the user in
@@ -602,6 +610,24 @@ uint32_t rte_mempool_obj_iter(struct rte_mempool *mp,
rte_mempool_obj_cb_t *obj_cb, void *obj_cb_arg);

 /**
+ * Call a function for each mempool memory chunk
+ *
+ * Iterate across all memory chunks attached to a rte_mempool and call
+ * the callback function on it.
+ *
+ * @param mp
+ *   A pointer to an initialized mempool.
+ * @param mem_cb
+ *   A function pointer that is called for each memory chunk.
+ * @param mem_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   Number of memory chunks iterated.
+ */
+uint32_t rte_mempool_mem_iter(struct rte_mempool *mp,
+   rte_mempool_mem_cb_t *mem_cb, void *mem_cb_arg);
+
+/**
  * Dump the status of the mempool to the console.
  *
  * @param f
diff --git a/lib/librte_mempool/rte_mempool_version.map 
b/lib/librte_mempool/rte_mempool_version.map
index 4db75ca..ca887b5 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -21,6 +21,7 @@ DPDK_16.07 {
global:

rte_mempool_obj_iter;
+   rte_mempool_mem_iter;

local: *;
 } DPDK_2.0;
-- 
2.1.4



[dpdk-dev] [RFC 16/35] mempool: store memory chunks in a list

2016-03-09 Thread Olivier Matz
Do not use paddr table to store the mempool memory chunks.
This will allow to have several chunks with different virtual addresses.

Signed-off-by: Olivier Matz 
---
 app/test/test_mempool.c  |   2 +-
 lib/librte_mempool/rte_mempool.c | 205 ++-
 lib/librte_mempool/rte_mempool.h |  51 +-
 3 files changed, 165 insertions(+), 93 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 1503bcf..80d95d5 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -123,7 +123,7 @@ test_mempool_basic(void)

printf("get private data\n");
if (rte_mempool_get_priv(mp) != (char *)mp +
-   MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))
+   MEMPOOL_HEADER_SIZE(mp, mp->cache_size))
return -1;

 #ifndef RTE_EXEC_ENV_BSD /* rte_mem_virt2phy() not supported on bsd */
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7aedc89..ff84f81 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -141,14 +141,12 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
phys_addr_t physaddr)
struct rte_mempool_objhdr *hdr;
struct rte_mempool_objtlr *tlr __rte_unused;

-   obj = (char *)obj + mp->header_size;
-   physaddr += mp->header_size;
-
/* set mempool ptr in header */
hdr = RTE_PTR_SUB(obj, sizeof(*hdr));
hdr->mp = mp;
hdr->physaddr = physaddr;
STAILQ_INSERT_TAIL(>elt_list, hdr, next);
+   mp->populated_size++;

 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
hdr->cookie = RTE_MEMPOOL_HEADER_COOKIE2;
@@ -246,33 +244,6 @@ rte_mempool_obj_iter(struct rte_mempool *mp,
return n;
 }

-/*
- * Populate  mempool with the objects.
- */
-
-static void
-mempool_obj_populate(void *arg, void *start, void *end,
-   __rte_unused uint32_t idx, phys_addr_t physaddr)
-{
-   struct rte_mempool *mp = arg;
-
-   mempool_add_elem(mp, start, physaddr);
-   mp->elt_va_end = (uintptr_t)end;
-}
-
-static void
-mempool_populate(struct rte_mempool *mp, size_t num, size_t align)
-{
-   uint32_t elt_sz;
-
-   elt_sz = mp->elt_size + mp->header_size + mp->trailer_size;
-
-   mp->size = rte_mempool_obj_mem_iter((void *)mp->elt_va_start,
-   num, elt_sz, align,
-   mp->elt_pa, mp->pg_num, mp->pg_shift,
-   mempool_obj_populate, mp);
-}
-
 /* get the header, trailer and total size of a mempool element. */
 uint32_t
 rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
@@ -465,6 +436,108 @@ rte_mempool_ring_create(struct rte_mempool *mp)
return 0;
 }

+/* Free memory chunks used by a mempool. Objects must be in pool */
+static void
+rte_mempool_free_memchunks(struct rte_mempool *mp)
+{
+   struct rte_mempool_memhdr *memhdr;
+   void *elt;
+
+   while (!STAILQ_EMPTY(>elt_list)) {
+   rte_ring_sc_dequeue(mp->ring, );
+   (void)elt;
+   STAILQ_REMOVE_HEAD(>elt_list, next);
+   mp->populated_size--;
+   }
+
+   while (!STAILQ_EMPTY(>mem_list)) {
+   memhdr = STAILQ_FIRST(>mem_list);
+   STAILQ_REMOVE_HEAD(>mem_list, next);
+   rte_free(memhdr);
+   mp->nb_mem_chunks--;
+   }
+}
+
+/* Add objects in the pool, using a physically contiguous memory
+ * zone. Return the number of objects added, or a negative value
+ * on error. */
+static int
+rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
+   phys_addr_t paddr, size_t len)
+{
+   unsigned total_elt_sz;
+   unsigned i = 0;
+   size_t off;
+   struct rte_mempool_memhdr *memhdr;
+
+   /* mempool is already populated */
+   if (mp->populated_size >= mp->size)
+   return -ENOSPC;
+
+   total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+   memhdr = rte_zmalloc("MEMPOOL_MEMHDR", sizeof(*memhdr), 0);
+   if (memhdr == NULL)
+   return -ENOMEM;
+
+   memhdr->mp = mp;
+   memhdr->addr = vaddr;
+   memhdr->phys_addr = paddr;
+   memhdr->len = len;
+
+   if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
+   off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
+   else
+   off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
+
+   while (off + total_elt_sz <= len && mp->populated_size < mp->size) {
+   off += mp->header_size;
+   mempool_add_elem(mp, (char *)vaddr + off, paddr + off);
+   off += mp->elt_size + mp->trailer_size;
+   i++;
+   }
+
+   /* not enough room to store one object */
+   if (i == 0)
+   return -EINVAL;
+
+   STAILQ_INSERT_TAIL(>mem_list, memhdr, next);
+   mp->nb_mem_chunks++;
+   return i;
+}
+
+/* Add objects in the pool, using a table of physical pages. Return the
+ * number of objects added, 

[dpdk-dev] [RFC 15/35] mempool: remove MEMPOOL_IS_CONTIG()

2016-03-09 Thread Olivier Matz
The next commits will change the behavior of the mempool library so that
the objects will never be allocated in the same memzone than the mempool
header. Therefore, there is no reason to keep this macro that would
always return 0.

This macro was only used in app/test.

Signed-off-by: Olivier Matz 
---
 app/test/test_mempool.c  | 7 +++
 lib/librte_mempool/rte_mempool.h | 7 ---
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 10e1fa4..1503bcf 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -126,12 +126,11 @@ test_mempool_basic(void)
MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))
return -1;

+#ifndef RTE_EXEC_ENV_BSD /* rte_mem_virt2phy() not supported on bsd */
printf("get physical address of an object\n");
-   if (MEMPOOL_IS_CONTIG(mp) &&
-   rte_mempool_virt2phy(mp, obj) !=
-   (phys_addr_t) (mp->phys_addr +
-   (phys_addr_t) ((char*) obj - (char*) mp)))
+   if (rte_mempool_virt2phy(mp, obj) != rte_mem_virt2phy(obj))
return -1;
+#endif

printf("put the object back\n");
rte_mempool_put(mp, obj);
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index f32d705..3bfdf4d 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -271,13 +271,6 @@ struct rte_mempool {
(sizeof(*(mp)) + __PA_SIZE(mp, pgn) + (((cs) == 0) ? 0 : \
(sizeof(struct rte_mempool_cache) * RTE_MAX_LCORE)))

-/**
- * Return true if the whole mempool is in contiguous memory.
- */
-#defineMEMPOOL_IS_CONTIG(mp)  \
-   ((mp)->pg_num == MEMPOOL_PG_NUM_DEFAULT && \
-   (mp)->phys_addr == (mp)->elt_pa[0])
-
 /* return the header of a mempool object (internal) */
 static inline struct rte_mempool_objhdr *__mempool_get_header(void *obj)
 {
-- 
2.1.4



[dpdk-dev] [RFC 14/35] mempool: store physaddr in mempool objects

2016-03-09 Thread Olivier Matz
Store the physical address of the object in its header. It simplifies
rte_mempool_virt2phy() and prepares the removing of the paddr[] table
in the mempool header.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 17 +++--
 lib/librte_mempool/rte_mempool.h | 10 ++
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index d533484..7aedc89 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -132,19 +132,22 @@ static unsigned optimize_object_size(unsigned obj_size)
 typedef void (*rte_mempool_obj_iter_t)(void * /*obj_iter_arg*/,
void * /*obj_start*/,
void * /*obj_end*/,
-   uint32_t /*obj_index */);
+   uint32_t /*obj_index */,
+   phys_addr_t /*physaddr*/);

 static void
-mempool_add_elem(struct rte_mempool *mp, void *obj)
+mempool_add_elem(struct rte_mempool *mp, void *obj, phys_addr_t physaddr)
 {
struct rte_mempool_objhdr *hdr;
struct rte_mempool_objtlr *tlr __rte_unused;

obj = (char *)obj + mp->header_size;
+   physaddr += mp->header_size;

/* set mempool ptr in header */
hdr = RTE_PTR_SUB(obj, sizeof(*hdr));
hdr->mp = mp;
+   hdr->physaddr = physaddr;
STAILQ_INSERT_TAIL(>elt_list, hdr, next);

 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
@@ -173,6 +176,7 @@ rte_mempool_obj_mem_iter(void *vaddr, uint32_t elt_num, 
size_t total_elt_sz,
uint32_t pgn, pgf;
uintptr_t end, start, va;
uintptr_t pg_sz;
+   phys_addr_t physaddr;

pg_sz = (uintptr_t)1 << pg_shift;
va = (uintptr_t)vaddr;
@@ -208,9 +212,10 @@ rte_mempool_obj_mem_iter(void *vaddr, uint32_t elt_num, 
size_t total_elt_sz,
 * otherwise, just skip that chunk unused.
 */
if (k == pgn) {
+   physaddr = paddr[k] + (start & (pg_sz - 1));
if (obj_iter != NULL)
obj_iter(obj_iter_arg, (void *)start,
-   (void *)end, i);
+   (void *)end, i, physaddr);
va = end;
j += pgf;
i++;
@@ -247,11 +252,11 @@ rte_mempool_obj_iter(struct rte_mempool *mp,

 static void
 mempool_obj_populate(void *arg, void *start, void *end,
-   __rte_unused uint32_t idx)
+   __rte_unused uint32_t idx, phys_addr_t physaddr)
 {
struct rte_mempool *mp = arg;

-   mempool_add_elem(mp, start);
+   mempool_add_elem(mp, start, physaddr);
mp->elt_va_end = (uintptr_t)end;
 }

@@ -355,7 +360,7 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t 
total_elt_sz, uint32_t pg_shift)
  * argument to the end of the object. */
 static void
 mempool_lelem_iter(void *arg, __rte_unused void *start, void *end,
-   __rte_unused uint32_t idx)
+   __rte_unused uint32_t idx, __rte_unused phys_addr_t physaddr)
 {
*(uintptr_t *)arg = (uintptr_t)end;
 }
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 5b760f0..f32d705 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -158,6 +158,7 @@ struct rte_mempool_objsz {
 struct rte_mempool_objhdr {
STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
struct rte_mempool *mp;  /**< The mempool owning the object. */
+   phys_addr_t physaddr;/**< Physical address of the object. */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
uint64_t cookie; /**< Debug cookie. */
 #endif
@@ -1125,13 +1126,14 @@ rte_mempool_empty(const struct rte_mempool *mp)
  *   The physical address of the elt element.
  */
 static inline phys_addr_t
-rte_mempool_virt2phy(const struct rte_mempool *mp, const void *elt)
+rte_mempool_virt2phy(__rte_unused const struct rte_mempool *mp, const void 
*elt)
 {
if (rte_eal_has_hugepages()) {
-   uintptr_t off;
+   const struct rte_mempool_objhdr *hdr;

-   off = (const char *)elt - (const char *)mp->elt_va_start;
-   return mp->elt_pa[off >> mp->pg_shift] + (off & mp->pg_mask);
+   hdr = (const struct rte_mempool_objhdr *)
+   ((const char *)elt - sizeof(*hdr));
+   return hdr->physaddr;
} else {
/*
 * If huge pages are disabled, we cannot assume the
-- 
2.1.4



[dpdk-dev] [RFC 13/35] mempool: create the internal ring in a specific function

2016-03-09 Thread Olivier Matz
This makes the code of rte_mempool_create() clearer, and it will make
the introduction of external mempool handler easier (in another patch
series). Indeed, this function contains the specific part when a ring is
used, but it could be replaced by something else in the future.

This commit also adds a socket_id field in the mempool structure that
is used by this new function.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 55 +---
 lib/librte_mempool/rte_mempool.h |  1 +
 2 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 4145e2e..d533484 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -431,6 +431,35 @@ rte_mempool_create(const char *name, unsigned n, unsigned 
elt_size,
   MEMPOOL_PG_SHIFT_MAX);
 }

+/* create the internal ring */
+static int
+rte_mempool_ring_create(struct rte_mempool *mp)
+{
+   int rg_flags = 0;
+   char rg_name[RTE_RING_NAMESIZE];
+   struct rte_ring *r;
+
+   snprintf(rg_name, sizeof(rg_name), RTE_MEMPOOL_MZ_FORMAT, mp->name);
+
+   /* ring flags */
+   if (mp->flags & MEMPOOL_F_SP_PUT)
+   rg_flags |= RING_F_SP_ENQ;
+   if (mp->flags & MEMPOOL_F_SC_GET)
+   rg_flags |= RING_F_SC_DEQ;
+
+   /* Allocate the ring that will be used to store objects.
+* Ring functions will return appropriate errors if we are
+* running as a secondary process etc., so no checks made
+* in this function for that condition. */
+   r = rte_ring_create(rg_name, rte_align32pow2(mp->size + 1),
+   mp->socket_id, rg_flags);
+   if (r == NULL)
+   return -rte_errno;
+
+   mp->ring = r;
+   return 0;
+}
+
 /*
  * Create the mempool over already allocated chunk of memory.
  * That external memory buffer can consists of physically disjoint pages.
@@ -447,15 +476,12 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
 {
char mz_name[RTE_MEMZONE_NAMESIZE];
-   char rg_name[RTE_RING_NAMESIZE];
struct rte_mempool_list *mempool_list;
struct rte_mempool *mp = NULL;
struct rte_tailq_entry *te = NULL;
-   struct rte_ring *r = NULL;
const struct rte_memzone *mz;
size_t mempool_size;
int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
-   int rg_flags = 0;
void *obj;
struct rte_mempool_objsz objsz;
void *startaddr;
@@ -498,12 +524,6 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
if (flags & MEMPOOL_F_NO_CACHE_ALIGN)
flags |= MEMPOOL_F_NO_SPREAD;

-   /* ring flags */
-   if (flags & MEMPOOL_F_SP_PUT)
-   rg_flags |= RING_F_SP_ENQ;
-   if (flags & MEMPOOL_F_SC_GET)
-   rg_flags |= RING_F_SC_DEQ;
-
/* calculate mempool object sizes. */
if (!rte_mempool_calc_obj_size(elt_size, flags, )) {
rte_errno = EINVAL;
@@ -512,15 +532,6 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,

rte_rwlock_write_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   /* allocate the ring that will be used to store objects */
-   /* Ring functions will return appropriate errors if we are
-* running as a secondary process etc., so no checks made
-* in this function for that condition */
-   snprintf(rg_name, sizeof(rg_name), RTE_MEMPOOL_MZ_FORMAT, name);
-   r = rte_ring_create(rg_name, rte_align32pow2(n+1), socket_id, rg_flags);
-   if (r == NULL)
-   goto exit_unlock;
-
/*
 * reserve a memory zone for this mempool: private data is
 * cache-aligned
@@ -589,7 +600,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
memset(mp, 0, sizeof(*mp));
snprintf(mp->name, sizeof(mp->name), "%s", name);
mp->phys_addr = mz->phys_addr;
-   mp->ring = r;
+   mp->socket_id = socket_id;
mp->size = n;
mp->flags = flags;
mp->elt_size = objsz.elt_size;
@@ -600,6 +611,9 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
mp->private_data_size = private_data_size;
STAILQ_INIT(>elt_list);

+   if (rte_mempool_ring_create(mp) < 0)
+   goto exit_unlock;
+
/*
 * local_cache pointer is set even if cache_size is zero.
 * The local_cache points to just past the elt_pa[] array.
@@ -651,7 +665,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,

 exit_unlock:
rte_rwlock_write_unlock(RTE_EAL_MEMPOOL_RWLOCK);
-   rte_ring_free(r);
+   if (mp != NULL)
+   rte_ring_free(mp->ring);
rte_free(te);

return NULL;

[dpdk-dev] [RFC 12/35] mempool: use the list to initialize mempool objects

2016-03-09 Thread Olivier Matz
Before this patch, the mempool elements were initialized at the time
they were added to the mempool. This patch changes this to do the
initialization of all objects once the mempool is populated, using
rte_mempool_obj_iter() introduced in previous commits.

Thanks to this modification, we are getting closer to a new API
that would allow us to do:
  mempool_init()
  mempool_populate(mem1)
  mempool_populate(mem2)
  mempool_populate(mem3)
  mempool_init_obj()

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 36 +---
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index a9af2fc..4145e2e 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -135,8 +135,7 @@ typedef void (*rte_mempool_obj_iter_t)(void * 
/*obj_iter_arg*/,
uint32_t /*obj_index */);

 static void
-mempool_add_elem(struct rte_mempool *mp, void *obj, uint32_t obj_idx,
-   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg)
+mempool_add_elem(struct rte_mempool *mp, void *obj)
 {
struct rte_mempool_objhdr *hdr;
struct rte_mempool_objtlr *tlr __rte_unused;
@@ -153,9 +152,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
uint32_t obj_idx,
tlr = __mempool_get_trailer(obj);
tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
 #endif
-   /* call the initializer */
-   if (obj_init)
-   obj_init(mp, obj_init_arg, obj, obj_idx);

/* enqueue in ring */
rte_ring_sp_enqueue(mp->ring, obj);
@@ -249,37 +245,27 @@ rte_mempool_obj_iter(struct rte_mempool *mp,
  * Populate  mempool with the objects.
  */

-struct mempool_populate_arg {
-   struct rte_mempool *mp;
-   rte_mempool_obj_cb_t   *obj_init;
-   void   *obj_init_arg;
-};
-
 static void
-mempool_obj_populate(void *arg, void *start, void *end, uint32_t idx)
+mempool_obj_populate(void *arg, void *start, void *end,
+   __rte_unused uint32_t idx)
 {
-   struct mempool_populate_arg *pa = arg;
+   struct rte_mempool *mp = arg;

-   mempool_add_elem(pa->mp, start, idx, pa->obj_init, pa->obj_init_arg);
-   pa->mp->elt_va_end = (uintptr_t)end;
+   mempool_add_elem(mp, start);
+   mp->elt_va_end = (uintptr_t)end;
 }

 static void
-mempool_populate(struct rte_mempool *mp, size_t num, size_t align,
-   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg)
+mempool_populate(struct rte_mempool *mp, size_t num, size_t align)
 {
uint32_t elt_sz;
-   struct mempool_populate_arg arg;

elt_sz = mp->elt_size + mp->header_size + mp->trailer_size;
-   arg.mp = mp;
-   arg.obj_init = obj_init;
-   arg.obj_init_arg = obj_init_arg;

mp->size = rte_mempool_obj_mem_iter((void *)mp->elt_va_start,
num, elt_sz, align,
mp->elt_pa, mp->pg_num, mp->pg_shift,
-   mempool_obj_populate, );
+   mempool_obj_populate, mp);
 }

 /* get the header, trailer and total size of a mempool element. */
@@ -648,7 +634,11 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
if (mp_init)
mp_init(mp, mp_init_arg);

-   mempool_populate(mp, n, 1, obj_init, obj_init_arg);
+   mempool_populate(mp, n, 1);
+
+   /* call the initializer */
+   if (obj_init)
+   rte_mempool_obj_iter(mp, obj_init, obj_init_arg);

te->data = (void *) mp;

-- 
2.1.4



[dpdk-dev] [RFC 11/35] mempool: use the list to audit all elements

2016-03-09 Thread Olivier Matz
Use the new rte_mempool_obj_iter() instead the old rte_mempool_obj_iter()
to iterate among objects to audit them (check for cookies).

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 42 +++-
 1 file changed, 7 insertions(+), 35 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 0f7c41f..a9af2fc 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -720,12 +720,6 @@ rte_mempool_dump_cache(FILE *f, const struct rte_mempool 
*mp)
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif

-struct mempool_audit_arg {
-   const struct rte_mempool *mp;
-   uintptr_t obj_end;
-   uint32_t obj_num;
-};
-
 /* check and update cookies or panic (internal) */
 void __mempool_check_cookies(const struct rte_mempool *mp,
void * const *obj_table_const, unsigned n, int free)
@@ -795,45 +789,23 @@ void __mempool_check_cookies(const struct rte_mempool *mp,
 }

 static void
-mempool_obj_audit(void *arg, void *start, void *end, uint32_t idx)
+mempool_obj_audit(struct rte_mempool *mp, __rte_unused void *opaque,
+   void *obj, __rte_unused unsigned idx)
 {
-   struct mempool_audit_arg *pa = arg;
-   void *obj;
-
-   obj = (char *)start + pa->mp->header_size;
-   pa->obj_end = (uintptr_t)end;
-   pa->obj_num = idx + 1;
-   __mempool_check_cookies(pa->mp, , 1, 2);
+   __mempool_check_cookies(mp, , 1, 2);
 }

 static void
 mempool_audit_cookies(const struct rte_mempool *mp)
 {
-   uint32_t elt_sz, num;
-   struct mempool_audit_arg arg;
-
-   elt_sz = mp->elt_size + mp->header_size + mp->trailer_size;
-
-   arg.mp = mp;
-   arg.obj_end = mp->elt_va_start;
-   arg.obj_num = 0;
-
-   num = rte_mempool_obj_mem_iter((void *)mp->elt_va_start,
-   mp->size, elt_sz, 1,
-   mp->elt_pa, mp->pg_num, mp->pg_shift,
-   mempool_obj_audit, );
+   unsigned num;

+   num = rte_mempool_obj_iter(RTE_DECONST(void *, mp),
+   mempool_obj_audit, NULL);
if (num != mp->size) {
-   rte_panic("rte_mempool_obj_iter(mempool=%p, size=%u) "
+   rte_panic("rte_mempool_obj_iter(mempool=%p, size=%u) "
"iterated only over %u elements\n",
mp, mp->size, num);
-   } else if (arg.obj_end != mp->elt_va_end || arg.obj_num != mp->size) {
-   rte_panic("rte_mempool_obj_iter(mempool=%p, size=%u) "
-   "last callback va_end: %#tx (%#tx expeceted), "
-   "num of objects: %u (%u expected)\n",
-   mp, mp->size,
-   arg.obj_end, mp->elt_va_end,
-   arg.obj_num, mp->size);
}
 }

-- 
2.1.4



[dpdk-dev] [RFC 10/35] eal: introduce RTE_DECONST macro

2016-03-09 Thread Olivier Matz
This macro removes the const attribute of a variable. It must be used
with care in specific situations. It's better to use this macro instead
of a manual cast, as it explicitly shows the intention of the developer.

This macro is used in the next commit of the series.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/common/include/rte_common.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_common.h 
b/lib/librte_eal/common/include/rte_common.h
index 332f2a4..dc0fc83 100644
--- a/lib/librte_eal/common/include/rte_common.h
+++ b/lib/librte_eal/common/include/rte_common.h
@@ -285,6 +285,15 @@ rte_align64pow2(uint64_t v)

 /*** Other general functions / macros /

+/**
+ * Remove the const attribute of a variable
+ *
+ * This must be used with care in specific situations. It's better to
+ * use this macro instead of a manual cast, as it explicitly shows the
+ * intention of the developer.
+ */
+#define RTE_DECONST(type, var) ((type)(uintptr_t)(const void *)(var))
+
 #ifdef __SSE2__
 #include 
 /**
-- 
2.1.4



[dpdk-dev] [RFC 09/35] mempool: use the list to iterate the mempool elements

2016-03-09 Thread Olivier Matz
Now that the mempool objects are chained into a list, we can use it to
browse them. This implies a rework of rte_mempool_obj_iter() API, that
does not need to take as many arguments as before. The previous function
is kept as a private function, and renamed in this commit. It will be
removed in a next commit of the patch series.

The only internal users of this function are the mellanox drivers. The
code is updated accordingly.

Introducing an API compatibility for this function has been considered,
but it is not easy to do without keeping the old code, as the previous
function could also be used to browse elements that were not added in a
mempool. Moreover, the API is already be broken by other patches in this
version.

Signed-off-by: Olivier Matz 
---
 drivers/net/mlx4/mlx4.c| 53 +++---
 drivers/net/mlx5/mlx5_rxtx.c   | 53 +++---
 drivers/net/mlx5/mlx5_rxtx.h   |  2 +-
 lib/librte_mempool/rte_mempool.c   | 36 ---
 lib/librte_mempool/rte_mempool.h   | 70 --
 lib/librte_mempool/rte_mempool_version.map |  3 +-
 6 files changed, 85 insertions(+), 132 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ee00151..d9b2291 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1265,7 +1265,6 @@ txq_mp2mr(struct txq *txq, const struct rte_mempool *mp)
 }

 struct txq_mp2mr_mbuf_check_data {
-   const struct rte_mempool *mp;
int ret;
 };

@@ -1273,34 +1272,26 @@ struct txq_mp2mr_mbuf_check_data {
  * Callback function for rte_mempool_obj_iter() to check whether a given
  * mempool object looks like a mbuf.
  *
- * @param[in, out] arg
- *   Context data (struct txq_mp2mr_mbuf_check_data). Contains mempool pointer
- *   and return value.
- * @param[in] start
- *   Object start address.
- * @param[in] end
- *   Object end address.
+ * @param[in] mp
+ *   The mempool pointer
+ * @param[in] arg
+ *   Context data (struct txq_mp2mr_mbuf_check_data). Contains the
+ *   return value.
+ * @param[in] obj
+ *   Object address.
  * @param index
- *   Unused.
- *
- * @return
- *   Nonzero value when object is not a mbuf.
+ *   Object index, unused.
  */
 static void
-txq_mp2mr_mbuf_check(void *arg, void *start, void *end,
-uint32_t index __rte_unused)
+txq_mp2mr_mbuf_check(struct rte_mempool *mp, void *arg, void *obj,
+   __rte_unused uint32_t index)
 {
struct txq_mp2mr_mbuf_check_data *data = arg;
-   struct rte_mbuf *buf =
-   (void *)((uintptr_t)start + data->mp->header_size);
+   struct rte_mbuf *buf = obj;

-   (void)index;
/* Check whether mbuf structure fits element size and whether mempool
 * pointer is valid. */
-   if (((uintptr_t)end >= (uintptr_t)(buf + 1)) &&
-   (buf->pool == data->mp))
-   data->ret = 0;
-   else
+   if (sizeof(*buf) > mp->elt_size || buf->pool != mp)
data->ret = -1;
 }

@@ -1314,28 +1305,16 @@ txq_mp2mr_mbuf_check(void *arg, void *start, void *end,
  *   Pointer to TX queue structure.
  */
 static void
-txq_mp2mr_iter(const struct rte_mempool *mp, void *arg)
+txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 {
struct txq *txq = arg;
struct txq_mp2mr_mbuf_check_data data = {
-   .mp = mp,
-   .ret = -1,
+   .ret = 0,
};

-   /* Discard empty mempools. */
-   if (mp->size == 0)
-   return;
/* Register mempool only if the first element looks like a mbuf. */
-   rte_mempool_obj_iter((void *)mp->elt_va_start,
-1,
-mp->header_size + mp->elt_size + mp->trailer_size,
-1,
-mp->elt_pa,
-mp->pg_num,
-mp->pg_shift,
-txq_mp2mr_mbuf_check,
-);
-   if (data.ret)
+   if (rte_mempool_obj_iter(mp, txq_mp2mr_mbuf_check, ) == 0 ||
+   data.ret == -1)
return;
txq_mp2mr(txq, mp);
 }
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index fa5e648..f002ca2 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -193,7 +193,6 @@ txq_mp2mr(struct txq *txq, const struct rte_mempool *mp)
 }

 struct txq_mp2mr_mbuf_check_data {
-   const struct rte_mempool *mp;
int ret;
 };

@@ -201,34 +200,26 @@ struct txq_mp2mr_mbuf_check_data {
  * Callback function for rte_mempool_obj_iter() to check whether a given
  * mempool object looks like a mbuf.
  *
- * @param[in, out] arg
- *   Context data (struct txq_mp2mr_mbuf_check_data). Contains mempool pointer
- *   and return value.
- * @param[in] start
- *   Object start address.
- * @param[in] end
- *   Object end address.
+ * @param[in] mp
+ *   The mempool 

[dpdk-dev] [RFC 08/35] mempool: remove const attribute in mempool_walk

2016-03-09 Thread Olivier Matz
Most functions that can be done on a mempool require a non-const mempool
pointer, except the dump and the audit. Therefore, the mempool_walk()
is more useful if the mempool pointer is not const.

This is required by next commit where the mellanox drivers use
rte_mempool_walk() to iterate the mempools, then rte_mempool_obj_iter()
to iterate the objects in each mempool.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 2 +-
 lib/librte_mempool/rte_mempool.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 1fe102f..237ba69 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -965,7 +965,7 @@ rte_mempool_lookup(const char *name)
return mp;
 }

-void rte_mempool_walk(void (*func)(const struct rte_mempool *, void *),
+void rte_mempool_walk(void (*func)(struct rte_mempool *, void *),
  void *arg)
 {
struct rte_tailq_entry *te = NULL;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 469bcbc..54a5917 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -1304,7 +1304,7 @@ ssize_t rte_mempool_xmem_usage(void *vaddr, uint32_t 
elt_num,
  * @param arg
  *   Argument passed to iterator
  */
-void rte_mempool_walk(void (*func)(const struct rte_mempool *, void *arg),
+void rte_mempool_walk(void (*func)(struct rte_mempool *, void *arg),
  void *arg);

 #ifdef __cplusplus
-- 
2.1.4



[dpdk-dev] [RFC 07/35] mempool: list objects when added in the mempool

2016-03-09 Thread Olivier Matz
Introduce a list entry in object header so they can be listed and
browsed. The objective is to introduce a more simple way to browse the
elements of a mempool.

The next commits will update rte_mempool_obj_iter() to use this list,
and remove the previous complex implementation.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c |  2 ++
 lib/librte_mempool/rte_mempool.h | 15 ---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 83e7ed6..1fe102f 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -138,6 +138,7 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
uint32_t obj_idx,
/* set mempool ptr in header */
hdr = RTE_PTR_SUB(obj, sizeof(*hdr));
hdr->mp = mp;
+   STAILQ_INSERT_TAIL(>elt_list, hdr, next);

 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
hdr->cookie = RTE_MEMPOOL_HEADER_COOKIE2;
@@ -585,6 +586,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
mp->cache_size = cache_size;
mp->cache_flushthresh = CALC_CACHE_FLUSHTHRESH(cache_size);
mp->private_data_size = private_data_size;
+   STAILQ_INIT(>elt_list);

/*
 * local_cache pointer is set even if cache_size is zero.
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index da04021..469bcbc 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -150,11 +150,13 @@ struct rte_mempool_objsz {
  * Mempool object header structure
  *
  * Each object stored in mempools are prefixed by this header structure,
- * it allows to retrieve the mempool pointer from the object. When debug
- * is enabled, a cookie is also added in this structure preventing
- * corruptions and double-frees.
+ * it allows to retrieve the mempool pointer from the object and to
+ * iterate on all objects attached to a mempool. When debug is enabled,
+ * a cookie is also added in this structure preventing corruptions and
+ * double-frees.
  */
 struct rte_mempool_objhdr {
+   STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
struct rte_mempool *mp;  /**< The mempool owning the object. */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
uint64_t cookie; /**< Debug cookie. */
@@ -162,6 +164,11 @@ struct rte_mempool_objhdr {
 };

 /**
+ * A list of object headers type
+ */
+STAILQ_HEAD(rte_mempool_objhdr_list, rte_mempool_objhdr);
+
+/**
  * Mempool object trailer structure
  *
  * In debug mode, each object stored in mempools are suffixed by this
@@ -194,6 +201,8 @@ struct rte_mempool {

struct rte_mempool_cache *local_cache; /**< Per-lcore local cache */

+   struct rte_mempool_objhdr_list elt_list; /**< List of objects in pool */
+
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
/** Per-lcore statistics. */
struct rte_mempool_debug_stats stats[RTE_MAX_LCORE];
-- 
2.1.4



[dpdk-dev] [RFC 06/35] mempool: update library version

2016-03-09 Thread Olivier Matz
Next changes of this patch series are too heavy to keep a compat
layer. So bump the version number of the library.

Signed-off-by: Olivier Matz 
---
 doc/guides/rel_notes/release_16_04.rst | 2 +-
 lib/librte_mempool/Makefile| 2 +-
 lib/librte_mempool/rte_mempool_version.map | 6 ++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 8273817..1ef8fa4 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -137,7 +137,7 @@ The libraries prepended with a plus sign were incremented 
in this version.
  librte_kvargs.so.1
  librte_lpm.so.2
  librte_mbuf.so.2
- librte_mempool.so.1
+   + librte_mempool.so.2
  librte_meter.so.1
  librte_pipeline.so.2
  librte_pmd_bond.so.1
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index a6898ef..706f844 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -38,7 +38,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3

 EXPORT_MAP := rte_mempool_version.map

-LIBABIVER := 1
+LIBABIVER := 2

 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
diff --git a/lib/librte_mempool/rte_mempool_version.map 
b/lib/librte_mempool/rte_mempool_version.map
index 17151e0..8c157d0 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -17,3 +17,9 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_16.07 {
+   global:
+
+   local: *;
+} DPDK_2.0;
-- 
2.1.4



[dpdk-dev] [RFC 05/35] mempool: rename mempool_obj_ctor_t as mempool_obj_cb_t

2016-03-09 Thread Olivier Matz
In next commits, we will add the ability to populate the
mempool and iterate through objects using the same function.
We will use the same callback type for that. As the callback is
not a constructor anymore, rename it into rte_mempool_obj_cb_t.

The rte_mempool_obj_iter_t that was used to iterate over objects
will be removed in next commits.

No functional change.
In this commit, the API is preserved through a compat typedef.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/mempool_anon.c|  4 ++--
 app/test-pmd/mempool_osdep.h   |  2 +-
 drivers/net/xenvirt/rte_eth_xenvirt.h  |  2 +-
 drivers/net/xenvirt/rte_mempool_gntalloc.c |  4 ++--
 lib/librte_mempool/rte_dom0_mempool.c  |  2 +-
 lib/librte_mempool/rte_mempool.c   |  8 
 lib/librte_mempool/rte_mempool.h   | 27 ++-
 7 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/app/test-pmd/mempool_anon.c b/app/test-pmd/mempool_anon.c
index 4730432..5e23848 100644
--- a/app/test-pmd/mempool_anon.c
+++ b/app/test-pmd/mempool_anon.c
@@ -86,7 +86,7 @@ struct rte_mempool *
 mempool_anon_create(const char *name, unsigned elt_num, unsigned elt_size,
   unsigned cache_size, unsigned private_data_size,
   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+  rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
   int socket_id, unsigned flags)
 {
struct rte_mempool *mp;
@@ -190,7 +190,7 @@ mempool_anon_create(__rte_unused const char *name,
__rte_unused unsigned private_data_size,
__rte_unused rte_mempool_ctor_t *mp_init,
__rte_unused void *mp_init_arg,
-   __rte_unused rte_mempool_obj_ctor_t *obj_init,
+   __rte_unused rte_mempool_obj_cb_t *obj_init,
__rte_unused void *obj_init_arg,
__rte_unused int socket_id, __rte_unused unsigned flags)
 {
diff --git a/app/test-pmd/mempool_osdep.h b/app/test-pmd/mempool_osdep.h
index 6b8df68..7ce7297 100644
--- a/app/test-pmd/mempool_osdep.h
+++ b/app/test-pmd/mempool_osdep.h
@@ -48,7 +48,7 @@ struct rte_mempool *
 mempool_anon_create(const char *name, unsigned n, unsigned elt_size,
unsigned cache_size, unsigned private_data_size,
rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-   rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
int socket_id, unsigned flags);

 #endif /*_RTE_MEMPOOL_OSDEP_H_ */
diff --git a/drivers/net/xenvirt/rte_eth_xenvirt.h 
b/drivers/net/xenvirt/rte_eth_xenvirt.h
index fc15a63..4995a9b 100644
--- a/drivers/net/xenvirt/rte_eth_xenvirt.h
+++ b/drivers/net/xenvirt/rte_eth_xenvirt.h
@@ -51,7 +51,7 @@ struct rte_mempool *
 rte_mempool_gntalloc_create(const char *name, unsigned elt_num, unsigned 
elt_size,
   unsigned cache_size, unsigned private_data_size,
   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+  rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
   int socket_id, unsigned flags);


diff --git a/drivers/net/xenvirt/rte_mempool_gntalloc.c 
b/drivers/net/xenvirt/rte_mempool_gntalloc.c
index 7bfbfda..69b9231 100644
--- a/drivers/net/xenvirt/rte_mempool_gntalloc.c
+++ b/drivers/net/xenvirt/rte_mempool_gntalloc.c
@@ -78,7 +78,7 @@ static struct _mempool_gntalloc_info
 _create_mempool(const char *name, unsigned elt_num, unsigned elt_size,
   unsigned cache_size, unsigned private_data_size,
   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+  rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
   int socket_id, unsigned flags)
 {
struct _mempool_gntalloc_info mgi;
@@ -253,7 +253,7 @@ struct rte_mempool *
 rte_mempool_gntalloc_create(const char *name, unsigned elt_num, unsigned 
elt_size,
   unsigned cache_size, unsigned private_data_size,
   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+  rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
   int socket_id, unsigned flags)
 {
int rv;
diff --git a/lib/librte_mempool/rte_dom0_mempool.c 
b/lib/librte_mempool/rte_dom0_mempool.c
index 0d6d750..0051bd5 100644
--- a/lib/librte_mempool/rte_dom0_mempool.c
+++ b/lib/librte_mempool/rte_dom0_mempool.c
@@ -83,7 +83,7 @@ struct rte_mempool *
 rte_dom0_mempool_create(const char *name, unsigned elt_num, unsigned elt_size,
unsigned cache_size, unsigned private_data_size,
rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-   rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+   rte_mempool_obj_cb_t 

[dpdk-dev] [RFC 04/35] mempool: use sizeof to get the size of header and trailer

2016-03-09 Thread Olivier Matz
Since commits d2e0ca22f and 97e7e685b the headers and trailers
of the mempool are defined as a structure. We can get their
size using a sizeof instead of doing a calculation that will
become wrong at the first structure update.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 8188442..ce0470d 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -264,24 +264,13 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
flags,

sz = (sz != NULL) ? sz : 

-   /*
-* In header, we have at least the pointer to the pool, and
-* optionaly a 64 bits cookie.
-*/
-   sz->header_size = 0;
-   sz->header_size += sizeof(struct rte_mempool *); /* ptr to pool */
-#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-   sz->header_size += sizeof(uint64_t); /* cookie */
-#endif
+   sz->header_size = sizeof(struct rte_mempool_objhdr);
if ((flags & MEMPOOL_F_NO_CACHE_ALIGN) == 0)
sz->header_size = RTE_ALIGN_CEIL(sz->header_size,
RTE_MEMPOOL_ALIGN);

-   /* trailer contains the cookie in debug mode */
-   sz->trailer_size = 0;
-#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-   sz->trailer_size += sizeof(uint64_t); /* cookie */
-#endif
+   sz->trailer_size = sizeof(struct rte_mempool_objtlr);
+
/* element size is 8 bytes-aligned at least */
sz->elt_size = RTE_ALIGN_CEIL(elt_size, sizeof(uint64_t));

-- 
2.1.4



[dpdk-dev] [RFC 03/35] mempool: uninline function to check cookies

2016-03-09 Thread Olivier Matz
There's no reason to keep this function inlined. Move it to
rte_mempool.c.

Note: we don't see it in the patch, but the #pragma ignoring
"-Wcast-qual" is still there in the C file.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 68 +++
 lib/librte_mempool/rte_mempool.h | 77 ++--
 2 files changed, 71 insertions(+), 74 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 25181d4..8188442 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -709,6 +709,74 @@ struct mempool_audit_arg {
uint32_t obj_num;
 };

+/* check and update cookies or panic (internal) */
+void __mempool_check_cookies(const struct rte_mempool *mp,
+   void * const *obj_table_const, unsigned n, int free)
+{
+   struct rte_mempool_objhdr *hdr;
+   struct rte_mempool_objtlr *tlr;
+   uint64_t cookie;
+   void *tmp;
+   void *obj;
+   void **obj_table;
+
+   /* Force to drop the "const" attribute. This is done only when
+* DEBUG is enabled */
+   tmp = (void *) obj_table_const;
+   obj_table = (void **) tmp;
+
+   while (n--) {
+   obj = obj_table[n];
+
+   if (rte_mempool_from_obj(obj) != mp)
+   rte_panic("MEMPOOL: object is owned by another "
+ "mempool\n");
+
+   hdr = __mempool_get_header(obj);
+   cookie = hdr->cookie;
+
+   if (free == 0) {
+   if (cookie != RTE_MEMPOOL_HEADER_COOKIE1) {
+   rte_log_set_history(0);
+   RTE_LOG(CRIT, MEMPOOL,
+   "obj=%p, mempool=%p, cookie=%" PRIx64 
"\n",
+   obj, (const void *) mp, cookie);
+   rte_panic("MEMPOOL: bad header cookie (put)\n");
+   }
+   hdr->cookie = RTE_MEMPOOL_HEADER_COOKIE2;
+   }
+   else if (free == 1) {
+   if (cookie != RTE_MEMPOOL_HEADER_COOKIE2) {
+   rte_log_set_history(0);
+   RTE_LOG(CRIT, MEMPOOL,
+   "obj=%p, mempool=%p, cookie=%" PRIx64 
"\n",
+   obj, (const void *) mp, cookie);
+   rte_panic("MEMPOOL: bad header cookie (get)\n");
+   }
+   hdr->cookie = RTE_MEMPOOL_HEADER_COOKIE1;
+   }
+   else if (free == 2) {
+   if (cookie != RTE_MEMPOOL_HEADER_COOKIE1 &&
+   cookie != RTE_MEMPOOL_HEADER_COOKIE2) {
+   rte_log_set_history(0);
+   RTE_LOG(CRIT, MEMPOOL,
+   "obj=%p, mempool=%p, cookie=%" PRIx64 
"\n",
+   obj, (const void *) mp, cookie);
+   rte_panic("MEMPOOL: bad header cookie 
(audit)\n");
+   }
+   }
+   tlr = __mempool_get_trailer(obj);
+   cookie = tlr->cookie;
+   if (cookie != RTE_MEMPOOL_TRAILER_COOKIE) {
+   rte_log_set_history(0);
+   RTE_LOG(CRIT, MEMPOOL,
+   "obj=%p, mempool=%p, cookie=%" PRIx64 "\n",
+   obj, (const void *) mp, cookie);
+   rte_panic("MEMPOOL: bad trailer cookie\n");
+   }
+   }
+}
+
 static void
 mempool_obj_audit(void *arg, void *start, void *end, uint32_t idx)
 {
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index ca4657f..6d98cdf 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -296,6 +296,7 @@ static inline struct rte_mempool_objtlr 
*__mempool_get_trailer(void *obj)
return (struct rte_mempool_objtlr *)RTE_PTR_ADD(obj, mp->elt_size);
 }

+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
 /**
  * @internal Check and update cookies or panic.
  *
@@ -310,80 +311,8 @@ static inline struct rte_mempool_objtlr 
*__mempool_get_trailer(void *obj)
  *   - 1: object is supposed to be free, mark it as allocated
  *   - 2: just check that cookie is valid (free or allocated)
  */
-#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-static inline void __mempool_check_cookies(const struct rte_mempool *mp,
-  void * const *obj_table_const,
-  unsigned n, int free)
-{
-   struct rte_mempool_objhdr *hdr;
-   struct rte_mempool_objtlr *tlr;
-   uint64_t cookie;
-   void *tmp;
-   void *obj;
-   void **obj_table;
-
-   

[dpdk-dev] [RFC 02/35] mempool: replace elt_size by total_elt_size

2016-03-09 Thread Olivier Matz
In some mempool functions, we use the size of the elements as arguments or in
variables. There is a confusion between the size including or not including
the header and trailer.

To avoid this confusion:
- update the API documentation
- rename the variables and argument names as "elt_size" when the size does not
  include the header and trailer, or else as "total_elt_size".

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 21 +++--
 lib/librte_mempool/rte_mempool.h | 19 +++
 2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 6db02ee..25181d4 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -156,13 +156,13 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
uint32_t obj_idx,
  *
  * Given the pointer to the memory, and its topology in physical memory
  * (the physical addresses table), iterate through the "elt_num" objects
- * of size "total_elt_sz" aligned at "align". For each object in this memory
+ * of size "elt_sz" aligned at "align". For each object in this memory
  * chunk, invoke a callback. It returns the effective number of objects
  * in this memory. */
 uint32_t
-rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t elt_sz, size_t 
align,
-   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
-   rte_mempool_obj_iter_t obj_iter, void *obj_iter_arg)
+rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
+   size_t align, const phys_addr_t paddr[], uint32_t pg_num,
+   uint32_t pg_shift, rte_mempool_obj_iter_t obj_iter, void *obj_iter_arg)
 {
uint32_t i, j, k;
uint32_t pgn, pgf;
@@ -178,7 +178,7 @@ rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t 
elt_sz, size_t align,
while (i != elt_num && j != pg_num) {

start = RTE_ALIGN_CEIL(va, align);
-   end = start + elt_sz;
+   end = start + total_elt_sz;

/* index of the first page for the next element. */
pgf = (end >> pg_shift) - (start >> pg_shift);
@@ -255,6 +255,7 @@ mempool_populate(struct rte_mempool *mp, size_t num, size_t 
align,
mempool_obj_populate, );
 }

+/* get the header, trailer and total size of a mempool element. */
 uint32_t
 rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
struct rte_mempool_objsz *sz)
@@ -332,17 +333,17 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
flags,
  * Calculate maximum amount of memory required to store given number of 
objects.
  */
 size_t
-rte_mempool_xmem_size(uint32_t elt_num, size_t elt_sz, uint32_t pg_shift)
+rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift)
 {
size_t n, pg_num, pg_sz, sz;

pg_sz = (size_t)1 << pg_shift;

-   if ((n = pg_sz / elt_sz) > 0) {
+   if ((n = pg_sz / total_elt_sz) > 0) {
pg_num = (elt_num + n - 1) / n;
sz = pg_num << pg_shift;
} else {
-   sz = RTE_ALIGN_CEIL(elt_sz, pg_sz) * elt_num;
+   sz = RTE_ALIGN_CEIL(total_elt_sz, pg_sz) * elt_num;
}

return sz;
@@ -362,7 +363,7 @@ mempool_lelem_iter(void *arg, __rte_unused void *start, 
void *end,
  * given memory footprint to store required number of elements.
  */
 ssize_t
-rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t elt_sz,
+rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
 {
uint32_t n;
@@ -373,7 +374,7 @@ rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, 
size_t elt_sz,
va = (uintptr_t)vaddr;
uv = va;

-   if ((n = rte_mempool_obj_iter(vaddr, elt_num, elt_sz, 1,
+   if ((n = rte_mempool_obj_iter(vaddr, elt_num, total_elt_sz, 1,
paddr, pg_num, pg_shift, mempool_lelem_iter,
)) != elt_num) {
return -(ssize_t)n;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index bd78df5..ca4657f 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -1289,7 +1289,7 @@ struct rte_mempool *rte_mempool_lookup(const char *name);
  * calculates header, trailer, body and total sizes of the mempool object.
  *
  * @param elt_size
- *   The size of each element.
+ *   The size of each element, without header and trailer.
  * @param flags
  *   The flags used for the mempool creation.
  *   Consult rte_mempool_create() for more information about possible values.
@@ -1315,14 +1315,15 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, 
uint32_t flags,
  *
  * @param elt_num
  *   Number of elements.
- * @param elt_sz
- *   The size of each element.
+ * @param total_elt_sz
+ *   The size of each element, including header and trailer, as returned
+ *   by 

[dpdk-dev] [RFC 01/35] mempool: fix comments and style

2016-03-09 Thread Olivier Matz
No functional change, just fix some comments and styling issues.
Also avoid to duplicate comments between rte_mempool_create()
and rte_mempool_xmem_create().

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 17 +---
 lib/librte_mempool/rte_mempool.h | 59 +---
 2 files changed, 26 insertions(+), 50 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 73ca770..6db02ee 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -152,6 +152,13 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
uint32_t obj_idx,
rte_ring_sp_enqueue(mp->ring, obj);
 }

+/* Iterate through objects at the given address
+ *
+ * Given the pointer to the memory, and its topology in physical memory
+ * (the physical addresses table), iterate through the "elt_num" objects
+ * of size "total_elt_sz" aligned at "align". For each object in this memory
+ * chunk, invoke a callback. It returns the effective number of objects
+ * in this memory. */
 uint32_t
 rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t elt_sz, size_t 
align,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
@@ -341,10 +348,8 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t elt_sz, 
uint32_t pg_shift)
return sz;
 }

-/*
- * Calculate how much memory would be actually required with the
- * given memory footprint to store required number of elements.
- */
+/* Callback used by rte_mempool_xmem_usage(): it sets the opaque
+ * argument to the end of the object. */
 static void
 mempool_lelem_iter(void *arg, __rte_unused void *start, void *end,
__rte_unused uint32_t idx)
@@ -352,6 +357,10 @@ mempool_lelem_iter(void *arg, __rte_unused void *start, 
void *end,
*(uintptr_t *)arg = (uintptr_t)end;
 }

+/*
+ * Calculate how much memory would be actually required with the
+ * given memory footprint to store required number of elements.
+ */
 ssize_t
 rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t elt_sz,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 8595e77..bd78df5 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -214,7 +214,7 @@ struct rte_mempool {

 }  __rte_cache_aligned;

-#define MEMPOOL_F_NO_SPREAD  0x0001 /**< Do not spread in memory. */
+#define MEMPOOL_F_NO_SPREAD  0x0001 /**< Do not spread among memory 
channels. */
 #define MEMPOOL_F_NO_CACHE_ALIGN 0x0002 /**< Do not align objs on cache 
lines.*/
 #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is 
"single-producer".*/
 #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
"single-consumer".*/
@@ -270,7 +270,8 @@ struct rte_mempool {
 /* return the header of a mempool object (internal) */
 static inline struct rte_mempool_objhdr *__mempool_get_header(void *obj)
 {
-   return (struct rte_mempool_objhdr *)RTE_PTR_SUB(obj, sizeof(struct 
rte_mempool_objhdr));
+   return (struct rte_mempool_objhdr *)RTE_PTR_SUB(obj,
+   sizeof(struct rte_mempool_objhdr));
 }

 /**
@@ -544,8 +545,9 @@ rte_mempool_create(const char *name, unsigned n, unsigned 
elt_size,
 /**
  * Create a new mempool named *name* in memory.
  *
- * This function uses ``memzone_reserve()`` to allocate memory. The
- * pool contains n elements of elt_size. Its size is set to n.
+ * The pool contains n elements of elt_size. Its size is set to n.
+ * This function uses ``memzone_reserve()`` to allocate the mempool header
+ * (and the objects if vaddr is NULL).
  * Depending on the input parameters, mempool elements can be either allocated
  * together with the mempool header, or an externally provided memory buffer
  * could be used to store mempool objects. In later case, that external
@@ -560,18 +562,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned 
elt_size,
  * @param elt_size
  *   The size of each element.
  * @param cache_size
- *   If cache_size is non-zero, the rte_mempool library will try to
- *   limit the accesses to the common lockless pool, by maintaining a
- *   per-lcore object cache. This argument must be lower or equal to
- *   CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE. It is advised to choose
- *   cache_size to have "n modulo cache_size == 0": if this is
- *   not the case, some elements will always stay in the pool and will
- *   never be used. The access to the per-lcore table is of course
- *   faster than the multi-producer/consumer pool. The cache can be
- *   disabled if the cache_size argument is set to 0; it can be useful to
- *   avoid losing objects in cache. Note that even if not used, the
- *   memory space for cache is always reserved in a mempool structure,
- *   except if CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE is set to 0.
+ *   Size of the cache. See rte_mempool_create() for details.
  * @param private_data_size
  *   

[dpdk-dev] [RFC 00/35] mempool: rework memory allocation

2016-03-09 Thread Olivier Matz
This series is a rework of mempool. For those who don't want to read
all the cover letter, here is a sumary:

- it is not possible to allocate large mempools if there is not enough
  contiguous memory, this series solves this issue
- introduce new APIs with less arguments: "create, populate, obj_init"
- allow to free a mempool
- split code in smaller functions, will ease the introduction of ext_handler
- remove test-pmd anonymous mempool creation
- remove most of dom0-specific mempool code
- opens the door for a eal_memory rework: we probably don't need large
  contiguous memory area anymore, working with pages would work.

This will clearly break the ABI, but as there are already 2 other changes that
will break it for 16.07, the target for this series is 16.07. I plan to send a
deprecation notice for 16.04 soon.

The API stays almost the same, no modification is needed in examples app
or in test-pmd. Only kni and mellanox drivers are slightly modified.

Description of the initial issue


The allocation of mbuf pool can fail even if there is enough memory.
The problem is related to the way the memory is allocated and used in
dpdk. It is particularly annoying with mbuf pools, but it can also fail
in other use cases allocating a large amount of memory.

- rte_malloc() allocates physically contiguous memory, which is needed
  for mempools, but useless most of the time.

  Allocating a large physically contiguous zone is often impossible
  because the system provide hugepages which may not be contiguous.

- rte_mempool_create() (and therefore rte_pktmbuf_pool_create())
  requires a physically contiguous zone.

- rte_mempool_xmem_create() does not solve the issue as it still
  needs the memory to be virtually contiguous, and there is no
  way in dpdk to allocate a virtually contiguous memory that is
  not also physically contiguous.

How to reproduce the issue
--

- start the dpdk with some 2MB hugepages (it can also occur with 1GB)
- allocate a large mempool
- even if there is enough memory, the allocation can fail

Example:

  git clone http://dpdk.org/git/dpdk
  cd dpdk
  make config T=x86_64-native-linuxapp-gcc
  make -j32
  mkdir -p /mnt/huge
  mount -t hugetlbfs nodev /mnt/huge
  echo 256 > 
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

  # we try to allocate a mempool whose size is ~450MB, it fails
  ./build/app/testpmd -l 2,4 -- --total-num-mbufs=20 -i

The EAL logs "EAL: Virtual area found at..." shows that there are
several zones, but all smaller than 450MB.

Workarounds:

- Use 1GB hugepages: it sometimes work, but for very large
  pools (millions of mbufs) there is the same issue. Moreover,
  it would consume 1GB memory at least which can be a lot
  in some cases.

- Reboot the machine or allocate hugepages at boot time: this increases
  the chances to have more contiguous memory, but does not completely
  solve the issue

Solutions
-

Below is a list of proposed solutions. I implemented a quick and dirty
PoC of solution 1, but it's not working in all conditions and it's
really an ugly hack.  This series implement the solution 4 which looks
the best to me, knowing it does not prevent to do more enhancements
in dpdk memory in the future (solution 3 for instance).

Solution 1: in application
--

- allocate several hugepages using rte_malloc() or rte_memzone_reserve()
  (only keeping complete hugepages)
- parse memsegs and /proc/maps to check which files mmaps these pages
- mmap the files in a contiguous virtual area
- use rte_mempool_xmem_create()

Cons:

- 1a. parsing the memsegs of rte config in the application does not
  use a public API, and can be broken if internal dpdk code changes
- 1b. some memory is lost due to malloc headers. Also, if the memory is
  very fragmented (ex: all 2MB pages are physically separated), it does
  not work at all because we cannot get any complete page. It is not
  possible to use a lower level allocator since commit fafcc11985a.
- 1c. we cannot use rte_pktmbuf_pool_create(), so we need to use mempool
  api and do a part of the job manually
- 1d. it breaks secondary processes as the virtual addresses won't be
  mmap'd at the same place in secondary process
- 1e. it only fixes the issue for the mbuf pool of the application,
  internal pools in dpdk libraries are not modified
- 1f. this is a pure linux solution (rte_map files)
- 1g. The application has to be aware of RTE_EAL_SINGLE_SEGMENTS option
  that changes the way hugepages are mapped. By the way, it's strange
  to have such a compile-time option, we should probably have only
  one behavior that works all the time.

Solution 2: in dpdk memory allocator


- do the same than solution 1 in a new function rte_malloc_non_contig():
  allocate several chunks and mmap them in a contiguous virtual memory
- a flag has to be added in malloc header to do the proper cleanup 

[dpdk-dev] [PATCH] doc: fix API change in release note

2016-03-09 Thread Thomas Monjalon
2016-03-09 19:59, Jingjing Wu:
> Move the structure ``rte_eth_fdir_masks`` change announcement from ABI
> to API in release note.
> 
> Fixes: 1409f127d7f1 (ethdev: fix byte order consistency of flow director)
> Signed-off-by: Jingjing Wu 

Applied, thanks


[dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api

2016-03-09 Thread Thomas Monjalon
2016-03-09 15:42, Ananyev, Konstantin:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 2016-03-09 15:23, Ananyev, Konstantin:
> > > >
> > > > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > > > +   if (to_send == 0)
> > > > > > > +   return 0;
> > > > > >
> > > > > > Why this check is done in the lib?
> > > > > > What is the performance gain if we are idle?
> > > > > > It can be done outside if needed.
> > > > >
> > > > > Yes, that could be done outside, but if user has to do it anyway,
> > > > > why not to put it inside?
> > > > > I don't expect any performance gain/loss because of that -
> > > > > just seems a bit more convenient to the user.
> > > >
> > > > It is handling an idle case so there is no gain obviously.
> > > > But the condition branching is surely a loss.
> > >
> > > I suppose that condition should always be checked:
> > > either in user code prior to function call or inside the
> > > function call itself.
> > > So don't expect any difference in performance here...
> > > Do you have any particular example when you think it would?
> > > Or are you talking about rte_eth_tx_buffer() calling
> > > rte_eth_tx_buffer_flush() internally?
> > > For that one - both are flush is 'static inline' , so I expect
> > > compiler be smart enough to remove this redundant check.
> > >
> > > > So why the user would you like to do this check?
> > > Just for user convenience - to save him doing that manually.
> > 
> > Probably I've missed something. If we remove this check, the function
> > will do nothing, right? How is it changing the behaviour?
> 
> If we'll remove that check, then 
> rte_eth_tx_burst(...,nb_pkts=0)->(*dev->tx_pkt_burst)(...,nb_pkts=0)
> will be called.
> So in that case it might be even slower, as we'll have to do a proper call.

If there is no packet, we have time to do a useless call.

> Of course user can avoid it by:
> 
> If(tx_buffer->nb_pkts != 0)
>   rte_eth_tx_buffer_flush(port, queue, tx_buffer);
> 
> But as I said what for to force user to do that?
> Why not to  make this check inside the function?

Because it may be slower when there are some packets
and will "accelerate" only the no-packet case.

We do not progress in this discussion. It is not a big deal, just a non sense.
So I agree to keep it if we change the website to announce that DPDK
accelerates the idle processing ;)


[dpdk-dev] Client Server Application using DPDK API

2016-03-09 Thread Remy Horton
'noon,

On 09/03/2016 08:45, Vivek Gupta wrote:
> Hi
>
> I want to write a Client Server application using DPDK API on a
> single machine. What are the basic building block for that. How can
> we write such application?

examples/l2fwd/main.c and examples/ethtool/ethtool-app/main.c are 
probably the easier examples to follow. In terms of function calls, it 
is pretty much:

rte_eal_init(..);
for (each port) {
rte_pktmbuf_pool_create(..);
rte_eth_dev_configure(..);
rte_eth_dev_rx_queue_setup(..);
rte_eth_dev_tx_queue_setup(..);
rte_eth_dev_start(..);
}
while(1) {
rte_eth_rx_burst(..); /* incoming frames */
rte_eth_tx_burst(..); /* outgoing frames */
}

Bear in mind that DPDK deals with MAC frames rather than higher level IP 
packets, which may be an issue if you intend to use TCP/IP based 
application protocols.


> ::DISCLAIMER::

Avoid using confidentality disclaimers on mailing list emails. It tends 
to "annoy" people.. :)

Regards,

..Remy


[dpdk-dev] [PATCH v3 4/4] mempool: add in the RTE_NEXT_ABI for ABI breakages

2016-03-09 Thread Hunt, David
Hi Olivier,

On 3/9/2016 4:31 PM, Olivier MATZ wrote:
> Hi David,
>
> On 03/09/2016 05:28 PM, Hunt, David wrote:
>
>> Sure, v4 will remove the NEXT_ABI patch , and replace it with just the
>> ABI break announcement for 16.07. For anyone who what's to try out the
>> patch, they can always get it from patchwork, but not as part 16.04.
> I think it's better to have the deprecation notice in a separate
> mail, outside of the patch series, so Thomas can just apply this
> one and let the series pending for 16.07.
>
> Thanks,
> Olivier

Yes, sure, makes perfect sense.

Thanks,
David.



[dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api

2016-03-09 Thread Thomas Monjalon
2016-03-09 15:32, Kulasek, TomaszX:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 2016-03-09 15:23, Ananyev, Konstantin:
> > > >
> > > > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > > > +   if (to_send == 0)
> > > > > > > +   return 0;
> > > > > >
> > > > > > Why this check is done in the lib?
> > > > > > What is the performance gain if we are idle?
> > > > > > It can be done outside if needed.
> > > > >
> > > > > Yes, that could be done outside, but if user has to do it anyway,
> > > > > why not to put it inside?
> > > > > I don't expect any performance gain/loss because of that - just
> > > > > seems a bit more convenient to the user.
> > > >
> > > > It is handling an idle case so there is no gain obviously.
> > > > But the condition branching is surely a loss.
> > >
> > > I suppose that condition should always be checked:
> > > either in user code prior to function call or inside the function call
> > > itself.
> > > So don't expect any difference in performance here...
> > > Do you have any particular example when you think it would?
> > > Or are you talking about rte_eth_tx_buffer() calling
> > > rte_eth_tx_buffer_flush() internally?
> > > For that one - both are flush is 'static inline' , so I expect
> > > compiler be smart enough to remove this redundant check.
> > >
> > > > So why the user would you like to do this check?
> > > Just for user convenience - to save him doing that manually.
> > 
> > Probably I've missed something. If we remove this check, the function will
> > do nothing, right? How is it changing the behaviour?
> 
> If we remove this check, function will try to send 0 packets and check
> condition for error. So we gain nothing with removing that.

Actually I should not arguing why removing it,
but you should arguing why adding it :)


[dpdk-dev] Kernel NIC Interface and SO_TIMESTAMPING/PTP Hardware Clock

2016-03-09 Thread Ralf Grosse Boerger
Hi,

I am planning to use a PTP Daemon (http://linuxptp.sourceforge.net/) on a
Ethernet port that is used by the DPDK.
The kernel NIC interface should allow me to pass PTP frame from/to the
Linux Ethernet interface (via rte_kni_tx_burst() and rte_kni_rx_burst()).

But the PTP daemon also requires SO_TIMESTAMPING/PTP Hardware Clock-API of the
Linux Ethernet device.
If I start the ptp4l daemon on a KNI device (for example vEth0_0) I
get an error that the hardware timestamping mode is not supported.

Can anyone point me to the relevant source files of the KNI driver
that need to be modified to add HW timestamping and PHC support to a
KNI device?
(I am using an Intel i210 card).

Thanks in advance
 Ralf


[dpdk-dev] [PATCH] hash: fix memcmp function pointer in multi-process environment

2016-03-09 Thread Dhananjaya Reddy Eadala
Hi Michael

If you agree on the #ifdef protection I explained in my previous mail, I
will re-submit the patch with refactoring the the commit log with less than
80 characters per line.

Thanks
Dhana


On Thu, Mar 3, 2016 at 8:00 PM, Dhananjaya Reddy Eadala 
wrote:

> Hi Michael
>
> Please see my answers to your comments here.
>
> 1. Sure, I will refactor the commit log to restrict not more than 80
> characters.
>
> 2. Not sure how we can ifdef at the location you mentioned. Can you please
> elaborate more on this.
> We already have similar ifdef protection to what you suggested and
> with that protection memcmp is assigned.
> Problem is in using the function pointer to call the compare function.
> So we need protection for invoking compare function, under
> multi-process environment.
>
> 3. I couldn't come up with any other idea to protect this function pointer
> invocation.
>
> Thanks
> Dhana
>
>
>
>
> On Thu, Mar 3, 2016 at 12:44 AM, Qiu, Michael 
> wrote:
>
>> On 3/3/2016 11:36 AM, Dhana Eadala wrote:
>> > We found a problem in dpdk-2.2 using under multi-process environment.
>> > Here is the brief description how we are using the dpdk:
>> >
>> > We have two processes proc1, proc2 using dpdk. These proc1 and proc2
>> are two different compiled binaries.
>> > proc1 is started as primary process and proc2 as secondary process.
>> >
>> > proc1:
>> > Calls srcHash = rte_hash_create("src_hash_name") to create rte_hash
>> structure.
>> > As part of this, this api initalized the rte_hash structure and set the
>> srcHash->rte_hash_cmp_eq to the address of memcmp() from proc1 address
>> space.
>> >
>> > proc2:
>> > calls srcHash =  rte_hash_find_existing("src_hash_name"). This returns
>> the rte_hash created by proc1.
>> > This srcHash->rte_hash_cmp_eq still points to the address of memcmp()
>> from proc1 address space.
>> > Later proc2  calls rte_hash_lookup_with_hash(srcHash, (const void*)
>> , key.sig);
>> > Under the hood, rte_hash_lookup_with_hash() invokes
>> __rte_hash_lookup_with_hash(), which in turn calls h->rte_hash_cmp_eq(key,
>> k->key, h->key_len).
>> > This leads to a crash as h->rte_hash_cmp_eq is an address from proc1
>> address space and is invalid address in proc2 address space.
>> >
>> > We found, from dpdk documentation, that
>> >
>> > "
>> >  The use of function pointers between multiple processes running based
>> of different compiled
>> >  binaries is not supported, since the location of a given function in
>> one process may be different to
>> >  its location in a second. This prevents the librte_hash library from
>> behaving properly as in a  multi-
>> >  threaded instance, since it uses a pointer to the hash function
>> internally.
>> >
>> >  To work around this issue, it is recommended that multi-process
>> applications perform the hash
>> >  calculations by directly calling the hashing function from the code
>> and then using the
>> >  rte_hash_add_with_hash()/rte_hash_lookup_with_hash() functions instead
>> of the functions which do
>> >  the hashing internally, such as rte_hash_add()/rte_hash_lookup().
>> > "
>> >
>> > We did follow the recommended steps by invoking
>> rte_hash_lookup_with_hash().
>> > It was no issue up to and including dpdk-2.0. In later releases started
>> crashing because rte_hash_cmp_eq is introduced in dpdk-2.1
>> >
>> > We fixed it with the following patch and would like to submit the patch
>> to dpdk.org.
>> > Patch is created such that, if anyone wanted to use dpdk in
>> multi-process environment with function pointers not shared, they need to
>> > define RTE_LIB_MP_NO_FUNC_PTR in their Makefile. Without defining this
>> flag in Makefile, it works as it is now.
>> >
>> > Signed-off-by: Dhana Eadala 
>> > ---
>> >
>>
>> Some comments:
>>
>> 1.  your commit log need to refactor, better to limit every line less
>> than 80 character.
>>
>> 2. I think you could add the ifdef here in
>> lib/librte_hash/rte_cuckoo_hash.c :
>> /*
>>  * If x86 architecture is used, select appropriate compare function,
>>  * which may use x86 instrinsics, otherwise use memcmp
>>  */
>> #if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686) ||\
>>  defined(RTE_ARCH_X86_X32) || defined(RTE_ARCH_ARM64)
>> /* Select function to compare keys */
>> switch (params->key_len) {
>> case 16:
>> h->rte_hash_cmp_eq = rte_hash_k16_cmp_eq;
>> break;
>> [...]
>> break;
>> default:
>> /* If key is not multiple of 16, use generic memcmp */
>> h->rte_hash_cmp_eq = memcmp;
>> }
>> #else
>> h->rte_hash_cmp_eq = memcmp;
>> #endif
>>
>> So that could remove other #ifdef in those lines.
>>
>> 3. I don't think ask others to write RTE_LIB_MP_NO_FUNC_PTR in makefile
>> is a good idea, if you really want to do that, please add a doc so that
>> others could know it.
>>
>> Thanks,
>> Michael
>>
>
>


[dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api

2016-03-09 Thread Kulasek, TomaszX


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, March 8, 2016 23:52
> To: Kulasek, TomaszX 
> Cc: dev at dpdk.org; Ananyev, Konstantin 
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
> 
> Hi,
> 

[...]

> > +/**
> > + * Callback function for tracking unsent buffered packets.
> > + *
> > + * This function can be passed to
> > +rte_eth_tx_buffer_set_err_callback() to
> > + * adjust the default behaviour when buffered packets cannot be sent.
> > +This
> > + * function drops any unsent packets, but also updates a
> > +user-supplied counter
> > + * to track the overall number of packets dropped. The counter should
> > +be an
> > + * uint64_t variable.
> > + *
> > + * NOTE: this function should not be called directly, instead it should
> be used
> > + *   as a callback for packet buffering.
> > + *
> > + * NOTE: when configuring this function as a callback with
> > + *   rte_eth_tx_buffer_set_err_callback(), the final, userdata
> parameter
> > + *   should point to an uint64_t value.
> 
> Please forget this idea of counter in the default callback.
> 

Ok, I forgot.

> [...]
> > +void
> > +rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t
> unsent,
> > +   void *userdata);
> 
> What about rte_eth_tx_buffer_default_callback as name?

This function is used now as default to count silently dropped packets and 
update error counter in tx_buffer structure. When I remove error counter and 
set silent drop as default behavior, it's better to have two callbacks to 
choice:

1) silently dropping packets (set as default)
2) as defined above to dropping with counter.

Maybe better is to define two default callbacks while many applications can 
still update it's internal error counter,
So IHMO these names are more descriptive:

rte_eth_tx_buffer_drop_callback
rte_eth_tx_buffer_count_callback

What you think?

Tomasz


[dpdk-dev] [PATCH v3 0/7] Performance optimizations for mlx5 and mlx4

2016-03-09 Thread Bruce Richardson
On Thu, Mar 03, 2016 at 03:27:10PM +0100, Adrien Mazarguil wrote:
> This patchset improves the mlx5 PMD performance by doing better prefetching,
> by reordering internal structure fields and by removing a few unnecessary
> operations.
> 
> Note: should be applied after "Add flow director and RX VLAN stripping
> support" to avoid conflicts.
> 
> Changes in v3:
> - None, submitted again due to dependency with previous patchset.
> 
> Changes in v2:
> - Rebased patchset on top of dpdk-next-net/rel_16_04.
> - Fixed missing update for receive function in rxq_rehash().
> - Added a commit to register memory on page boundaries instead of mempool
>   object boundaries for better performance (mlx4 and mlx5).
> 
> Adrien Mazarguil (1):
>   mlx: use aligned memory to register regions
> 
> Nelio Laranjeiro (6):
>   mlx5: prefetch next TX mbuf header and data
>   mlx5: reorder TX/RX queue structure
>   mlx5: remove one indirection level from RX/TX functions
>   mlx5: process offload flags only when requested
>   mlx5: avoid lkey retrieval for inlined packets
>   mlx5: free buffers immediately after completion
>
Applied to dpdk-next-net/rel_16_04

/Bruce



[dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api

2016-03-09 Thread Thomas Monjalon
2016-03-09 15:23, Ananyev, Konstantin:
> > 
> > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > +   if (to_send == 0)
> > > > > +   return 0;
> > > >
> > > > Why this check is done in the lib?
> > > > What is the performance gain if we are idle?
> > > > It can be done outside if needed.
> > >
> > > Yes, that could be done outside, but if user has to do it anyway,
> > > why not to put it inside?
> > > I don't expect any performance gain/loss because of that -
> > > just seems a bit more convenient to the user.
> > 
> > It is handling an idle case so there is no gain obviously.
> > But the condition branching is surely a loss.
> 
> I suppose that condition should always be checked:
> either in user code prior to function call or inside the
> function call itself.
> So don't expect any difference in performance here...
> Do you have any particular example when you think it would? 
> Or are you talking about rte_eth_tx_buffer() calling
> rte_eth_tx_buffer_flush() internally?
> For that one - both are flush is 'static inline' , so I expect
> compiler be smart enough to remove this redundant check.  
> 
> > So why the user would you like to do this check?
> Just for user convenience - to save him doing that manually.

Probably I've missed something. If we remove this check, the function
will do nothing, right? How is it changing the behaviour?


[dpdk-dev] [PATCH v2] i40evf: enable ops to set mac address

2016-03-09 Thread Jingjing Wu
This patch implemented the ops of adding and removing mac
address in i40evf driver. Functions are assigned like:
  .mac_addr_add=  i40evf_add_mac_addr,
  .mac_addr_remove = i40evf_del_mac_addr,
To support multiple mac addresses setting, this patch also
extended the mac addresses adding and deletion when device
start and stop. For each VF, 64 mac addresses can be added
to in maximum.

Signed-off-by: Jingjing Wu 
Acked-by: Zhe Tao 
---
v2 change:
 - rebase to latest dpdk-next-net/rel_16_04(commit: 0f9564a0e4f2)

 doc/guides/rel_notes/release_16_04.rst |   2 +
 drivers/net/i40e/i40e_ethdev.c |   2 -
 drivers/net/i40e/i40e_ethdev.h |   3 +
 drivers/net/i40e/i40e_ethdev_vf.c  | 123 +
 4 files changed, 98 insertions(+), 32 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index eab5f92..e019669 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -105,6 +105,8 @@ This section should contain new features added in this 
release. Sample format:
   be down.
   We added the support of auto-neg by SW to avoid this link down issue.

+* **Added i40e VF mac address setting support.**
+

 Resolved Issues
 ---
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 0c87ec1..49222f4 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -62,8 +62,6 @@
 #include "i40e_rxtx.h"
 #include "i40e_pf.h"

-/* Maximun number of MAC addresses */
-#define I40E_NUM_MACADDR_MAX   64
 #define I40E_CLEAR_PXE_WAIT_MS 200

 /* Maximun number of capability elements */
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index a9b805e..237a42c 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -53,6 +53,9 @@
 #define I40E_DEFAULT_QP_NUM_FDIR  1
 #define I40E_UINT32_BIT_SIZE  (CHAR_BIT * sizeof(uint32_t))
 #define I40E_VFTA_SIZE(4096 / I40E_UINT32_BIT_SIZE)
+/* Maximun number of MAC addresses */
+#define I40E_NUM_MACADDR_MAX   64
+
 /*
  * vlan_id is a 12 bit number.
  * The VFTA array is actually a 4096 bit array, 128 of 32bit elements.
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 6185ee8..6b7b350 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -139,6 +139,11 @@ static int i40evf_dev_tx_queue_start(struct rte_eth_dev 
*dev,
 uint16_t tx_queue_id);
 static int i40evf_dev_tx_queue_stop(struct rte_eth_dev *dev,
uint16_t tx_queue_id);
+static void i40evf_add_mac_addr(struct rte_eth_dev *dev,
+   struct ether_addr *addr,
+   uint32_t index,
+   uint32_t pool);
+static void i40evf_del_mac_addr(struct rte_eth_dev *dev, uint32_t index);
 static int i40evf_dev_rss_reta_update(struct rte_eth_dev *dev,
struct rte_eth_rss_reta_entry64 *reta_conf,
uint16_t reta_size);
@@ -210,6 +215,8 @@ static const struct eth_dev_ops i40evf_eth_dev_ops = {
.rx_descriptor_done   = i40e_dev_rx_descriptor_done,
.tx_queue_setup   = i40e_dev_tx_queue_setup,
.tx_queue_release = i40e_dev_tx_queue_release,
+   .mac_addr_add = i40evf_add_mac_addr,
+   .mac_addr_remove  = i40evf_del_mac_addr,
.reta_update  = i40evf_dev_rss_reta_update,
.reta_query   = i40evf_dev_rss_reta_query,
.rss_hash_update  = i40evf_dev_rss_hash_update,
@@ -855,8 +862,11 @@ i40evf_stop_queues(struct rte_eth_dev *dev)
return 0;
 }

-static int
-i40evf_add_mac_addr(struct rte_eth_dev *dev, struct ether_addr *addr)
+static void
+i40evf_add_mac_addr(struct rte_eth_dev *dev,
+   struct ether_addr *addr,
+   __rte_unused uint32_t index,
+   __rte_unused uint32_t pool)
 {
struct i40e_virtchnl_ether_addr_list *list;
struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
@@ -870,7 +880,7 @@ i40evf_add_mac_addr(struct rte_eth_dev *dev, struct 
ether_addr *addr)
addr->addr_bytes[0], addr->addr_bytes[1],
addr->addr_bytes[2], addr->addr_bytes[3],
addr->addr_bytes[4], addr->addr_bytes[5]);
-   return -1;
+   return;
}

list = (struct i40e_virtchnl_ether_addr_list *)cmd_buffer;
@@ -889,25 +899,29 @@ i40evf_add_mac_addr(struct rte_eth_dev *dev, struct 
ether_addr *addr)
PMD_DRV_LOG(ERR, "fail to execute command "
"OP_ADD_ETHER_ADDRESS");

-   return err;
+   return;
 }

-static int
-i40evf_del_mac_addr(struct rte_eth_dev *dev, struct ether_addr *addr)
+static void

[dpdk-dev] [PATCH v9 0/2] eal: add function to check primary alive

2016-03-09 Thread Thomas Monjalon
2016-03-09 13:37, Harry van Haaren:
> The first patch of this patchset contains a fix for EAL PCI probing,
> to avoid a race-condition where a primary and secondary probe PCI
> devices at the same time.
> 
> The second patch adds a function that can be polled by a process to
> detect if a DPDK primary process is alive. This function does not
> rely on rte_eal_init(), as this uses the EAL and thus stops a
> primary from starting.
> 
> The functionality provided by this patch is very useful for providing
> additional services to DPDK primary applications such as monitoring
> statistics and performing fault detection.

Applied, thanks


[dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api

2016-03-09 Thread Ananyev, Konstantin


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, March 09, 2016 3:52 PM
> To: Ananyev, Konstantin
> Cc: Kulasek, TomaszX; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
> 
> 2016-03-09 15:42, Ananyev, Konstantin:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > 2016-03-09 15:23, Ananyev, Konstantin:
> > > > >
> > > > > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > > > > +   if (to_send == 0)
> > > > > > > > +   return 0;
> > > > > > >
> > > > > > > Why this check is done in the lib?
> > > > > > > What is the performance gain if we are idle?
> > > > > > > It can be done outside if needed.
> > > > > >
> > > > > > Yes, that could be done outside, but if user has to do it anyway,
> > > > > > why not to put it inside?
> > > > > > I don't expect any performance gain/loss because of that -
> > > > > > just seems a bit more convenient to the user.
> > > > >
> > > > > It is handling an idle case so there is no gain obviously.
> > > > > But the condition branching is surely a loss.
> > > >
> > > > I suppose that condition should always be checked:
> > > > either in user code prior to function call or inside the
> > > > function call itself.
> > > > So don't expect any difference in performance here...
> > > > Do you have any particular example when you think it would?
> > > > Or are you talking about rte_eth_tx_buffer() calling
> > > > rte_eth_tx_buffer_flush() internally?
> > > > For that one - both are flush is 'static inline' , so I expect
> > > > compiler be smart enough to remove this redundant check.
> > > >
> > > > > So why the user would you like to do this check?
> > > > Just for user convenience - to save him doing that manually.
> > >
> > > Probably I've missed something. If we remove this check, the function
> > > will do nothing, right? How is it changing the behaviour?
> >
> > If we'll remove that check, then
> > rte_eth_tx_burst(...,nb_pkts=0)->(*dev->tx_pkt_burst)(...,nb_pkts=0)
> > will be called.
> > So in that case it might be even slower, as we'll have to do a proper call.
> 
> If there is no packet, we have time to do a useless call.

One lcore can do TX for several queues/ports.
Let say we have N queues to handle, but right now traffic is going only through
one of them. 
That means we'll have to do N-1 useless calls and reduce number of cycles
available to send actual traffic.

> 
> > Of course user can avoid it by:
> >
> > If(tx_buffer->nb_pkts != 0)
> > rte_eth_tx_buffer_flush(port, queue, tx_buffer);
> >
> > But as I said what for to force user to do that?
> > Why not to  make this check inside the function?
> 
> Because it may be slower when there are some packets
> and will "accelerate" only the no-packet case.
> 
> We do not progress in this discussion.
> It is not a big deal, 

Exactly.

>just a non sense.

Look at what most of current DPDK examples do: they do check manually
does nb_pkts==0 or not, if not call tx_burst().
For me it makes sense to move that check into the library function -
so each and every caller doesn't have to do it manually.

> So I agree to keep it if we change the website to announce that DPDK
> accelerates the idle processing ;)

That's fine by me, but at first I suppose you'll have to provide some data
showing that this approach slowdowns things, right? :)

Konstantin




[dpdk-dev] [PATCH 1/2] ethdev: bump library version

2016-03-09 Thread Thomas Monjalon
> > There was an ABI change and more are coming in the release 16.04.
> > 
> > Fixes: a9963a86b2e1 ("ethdev: increase RETA entry size")
> > 
> > Signed-off-by: Thomas Monjalon 
> 
> Series Acked-by: Nelio Laranjeiro 

Applied


[dpdk-dev] [PATCH v3 0/5] Add flow director and RX VLAN stripping support

2016-03-09 Thread Bruce Richardson
On Thu, Mar 03, 2016 at 03:26:39PM +0100, Adrien Mazarguil wrote:
> To preserve compatibility with Mellanox OFED 3.1, flow director and RX VLAN
> stripping code is only enabled if compiled with 3.2.
> 
> Changes in v3:
> - Fixed flow registration issue caused by missing masks in flow rules.
> - Fixed packet duplication with overlapping FDIR rules.
> - Added FDIR flush command support.
> - Updated Mellanox OFED prerequisite to 3.2-2.0.0.0.
> 
> Changes in v2:
> - Rebased patchset on top of dpdk-next-net/rel_16_04.
> - Fixed trivial compilation warnings (positive errnos are left on purpose).
> - Updated documentation and release notes for flow director and RX VLAN
>   stripping features.
> - Fixed missing Mellanox OFED >= 3.2 check for CQ family query interface
>   version.
> 
> Yaacov Hazan (5):
>   mlx5: refactor special flows handling
>   mlx5: add special flows (broadcast and IPv6 multicast)
>   mlx5: make flow steering rule generator more generic
>   mlx5: add support for flow director
>   mlx5: add support for RX VLAN stripping
> 
Applied to dpdk-next-net/rel_16_04

Thanks,
/Bruce



[dpdk-dev] [PATCH 1/2] ethdev: bump library version

2016-03-09 Thread NĂ©lio Laranjeiro
On Wed, Mar 09, 2016 at 03:14:07PM +0100, Thomas Monjalon wrote:
> There was an ABI change and more are coming in the release 16.04.
> 
> Fixes: a9963a86b2e1 ("ethdev: increase RETA entry size")
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  doc/guides/rel_notes/release_16_04.rst | 5 -
>  lib/librte_ether/Makefile  | 2 +-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_16_04.rst 
> b/doc/guides/rel_notes/release_16_04.rst
> index 96f144e..8101f4c 100644
> --- a/doc/guides/rel_notes/release_16_04.rst
> +++ b/doc/guides/rel_notes/release_16_04.rst
> @@ -148,6 +148,9 @@ ABI Changes
>  * The fields in ethdev structure ``rte_eth_fdir_masks`` were changed
>to be in big endian.
>  
> +* The RETA entry size in ``rte_eth_rss_reta_entry64`` has been increased
> +  from 8-bit to 16-bit.
> +
>  
>  Shared Library Versions
>  ---
> @@ -158,7 +161,7 @@ The libraries prepended with a plus sign were incremented 
> in this version.
>  
>  .. code-block:: diff
>  
> - libethdev.so.2
> +   + libethdev.so.3
>   librte_acl.so.2
>   librte_cfgfile.so.2
>   librte_cmdline.so.1
> diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
> index 3e81a0e..e810284 100644
> --- a/lib/librte_ether/Makefile
> +++ b/lib/librte_ether/Makefile
> @@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)
>  
>  EXPORT_MAP := rte_ether_version.map
>  
> -LIBABIVER := 2
> +LIBABIVER := 3
>  
>  SRCS-y += rte_ethdev.c
>  
> -- 
> 2.7.0
> 

Series Acked-by: Nelio Laranjeiro 

-- 
N?lio Laranjeiro
6WIND


[dpdk-dev] [PATCH v3 4/4] examples/ip_pipeline: add packets dumping to PCAP file support

2016-03-09 Thread Fan Zhang
This patch add packet dumping feature to ip_pipeline. Output port type
SINK now supports dumping packets to PCAP file before releasing mbuf back
to mempool. This feature can be applied by specifying parameters in
configuration file as shown below:

[PIPELINE1]
type = PASS-THROUGH
core = 1
pktq_in = SOURCE0 SOURCE1
pktq_out = SINK0 SINK1
pcap_file_wr = /path/to/eth1.pcap /path/to/eth2.pcap
pcap_n_pkt_wr = 80 0

The configuration section "pcap_file_wr" contains full path and name of
the PCAP file which the packets will be dumped to. If multiple SINKs
exists, each shall have its own PCAP file path listed in this section,
separated by spaces. Multiple SINK ports shall NOT share same PCAP file to
be dumped.

The configuration section "pcap_n_pkt_wr" contains integer value(s)
and indicates the maximum number of packets to be dumped to the PCAP file.
If this value is "0", the "infinite" dumping mode will be used. If this
value is N (N > 0), the dumping will be finished when the number of
packets dumped to the file reaches N.

To enable PCAP dumping support to IP pipeline, the compiler option
CONFIG_RTE_PORT_PCAP must be set to 'y'. It is possible to disable this
feature by removing "pcap_file_wr" and "pcap_n_pkt_wr" lines from the
configuration file.

Signed-off-by: Fan Zhang 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/app.h  |   2 +
 examples/ip_pipeline/config_parse.c | 172 
 examples/ip_pipeline/init.c |  12 +++
 examples/ip_pipeline/pipeline_be.h  |   4 +-
 4 files changed, 189 insertions(+), 1 deletion(-)

diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index 0c22f7f..55a9841 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -156,6 +156,8 @@ struct app_pktq_source_params {
 struct app_pktq_sink_params {
char *name;
uint8_t parsed;
+   char *file_name; /* Full path of PCAP file to be copied to mbufs */
+   uint32_t n_pkts_to_dump;
 };

 struct app_msgq_params {
diff --git a/examples/ip_pipeline/config_parse.c 
b/examples/ip_pipeline/config_parse.c
index 291dbfb..e39c23e 100644
--- a/examples/ip_pipeline/config_parse.c
+++ b/examples/ip_pipeline/config_parse.c
@@ -187,6 +187,8 @@ struct app_pktq_source_params default_source_params = {

 struct app_pktq_sink_params default_sink_params = {
.parsed = 0,
+   .file_name = NULL,
+   .n_pkts_to_dump = 0,
 };

 struct app_msgq_params default_msgq_params = {
@@ -1036,6 +1038,85 @@ parse_pipeline_pcap_source(struct app_params *app,
 }

 static int
+parse_pipeline_pcap_sink(struct app_params *app,
+   struct app_pipeline_params *p,
+   const char *file_name, const char *n_pkts_to_dump)
+{
+   const char *next = NULL;
+   char *end;
+   uint32_t i;
+   int parse_file = 0;
+
+   if (file_name && !n_pkts_to_dump) {
+   next = file_name;
+   parse_file = 1; /* parse file path */
+   } else if (n_pkts_to_dump && !file_name) {
+   next = n_pkts_to_dump;
+   parse_file = 0; /* parse copy size */
+   } else
+   return -EINVAL;
+
+   char name[APP_PARAM_NAME_SIZE];
+   size_t name_len;
+
+   if (p->n_pktq_out == 0)
+   return -EINVAL;
+
+   for (i = 0; i < p->n_pktq_out; i++) {
+   if (p->pktq_out[i].type != APP_PKTQ_OUT_SINK)
+   return -EINVAL;
+   }
+
+   i = 0;
+   while (*next != '\0') {
+   uint32_t id;
+
+   if (i >= p->n_pktq_out)
+   return -EINVAL;
+
+   id = p->pktq_out[i].id;
+
+   end = strchr(next, ' ');
+   if (!end)
+   name_len = strlen(next);
+   else
+   name_len = end - next;
+
+   if (name_len == 0 || name_len == sizeof(name))
+   return -EINVAL;
+
+   strncpy(name, next, name_len);
+   name[name_len] = '\0';
+   next += name_len;
+   if (*next != '\0')
+   next++;
+
+   if (parse_file) {
+   app->sink_params[id].file_name = strdup(name);
+   if (app->sink_params[id].file_name == NULL)
+   return -ENOMEM;
+   } else {
+   if (parser_read_uint32(
+   >sink_params[id].n_pkts_to_dump,
+   name) != 0) {
+   if (app->sink_params[id].file_name !=
+   NULL)
+   free(app->sink_params[id].
+   file_name);
+   return -EINVAL;
+   }
+   }
+
+   i++;
+
+   if (i == p->n_pktq_out)
+   return 0;
+   }
+
+   return 

[dpdk-dev] [PATCH v3 3/4] lib/librte_port: add packet dumping to PCAP file support in sink port

2016-03-09 Thread Fan Zhang
Originally, sink ports in librte_port releases received mbufs back to
mempool. This patch adds optional packet dumping to PCAP feature in sink
port: the packets will be dumped to user defined PCAP file for storage or
debugging. The user may also choose the sink port's activity: either it
continuously dump the packets to the file, or stops at certain dumping

This feature shares same CONFIG_RTE_PORT_PCAP compiler option as source
port PCAP file support feature. Users can enable or disable this feature
by setting CONFIG_RTE_PORT_PCAP compiler option "y" or "n".

Signed-off-by: Fan Zhang 
Acked-by: Cristian Dumitrescu 
---
 lib/librte_port/rte_port_source_sink.c | 248 -
 lib/librte_port/rte_port_source_sink.h |  11 +-
 2 files changed, 256 insertions(+), 3 deletions(-)

diff --git a/lib/librte_port/rte_port_source_sink.c 
b/lib/librte_port/rte_port_source_sink.c
index 3d4e8d9..6a7ba64 100644
--- a/lib/librte_port/rte_port_source_sink.c
+++ b/lib/librte_port/rte_port_source_sink.c
@@ -40,6 +40,7 @@
 #ifdef RTE_NEXT_ABI

 #include 
+#include 

 #ifdef RTE_PORT_PCAP
 #include 
@@ -400,12 +401,183 @@ rte_port_source_stats_read(void *port,

 struct rte_port_sink {
struct rte_port_out_stats stats;
+
+   /* PCAP dumper handle and pkts number */
+   void *dumper;
+   uint32_t max_pkts;
+   uint32_t pkt_index;
+   uint32_t dump_finish;
 };

+#ifdef RTE_PORT_PCAP
+
+/**
+ * Open PCAP file for dumping packets to the file later
+ *
+ * @param port
+ *   Handle to sink port
+ * @param p
+ *   Sink port parameter
+ * @return
+ *   0 on SUCCESS
+ *   error code otherwise
+ */
+static int
+pcap_sink_open(struct rte_port_sink *port,
+   __rte_unused struct rte_port_sink_params *p)
+{
+   pcap_t *tx_pcap;
+   pcap_dumper_t *pcap_dumper;
+
+   if (p->file_name == NULL) {
+   port->dumper = NULL;
+   port->max_pkts = 0;
+   port->pkt_index = 0;
+   port->dump_finish = 0;
+   return 0;
+   }
+
+   /** Open a dead pcap handler for opening dumper file */
+   tx_pcap = pcap_open_dead(DLT_EN10MB, 65535);
+   if (tx_pcap == NULL)
+   return -ENOENT;
+
+   /* The dumper is created using the previous pcap_t reference */
+   pcap_dumper = pcap_dump_open(tx_pcap, p->file_name);
+   if (pcap_dumper == NULL)
+   return -ENOENT;
+
+   port->dumper = pcap_dumper;
+   port->max_pkts = p->max_n_pkts;
+   port->pkt_index = 0;
+   port->dump_finish = 0;
+
+   return 0;
+}
+
+uint8_t jumbo_pkt_buf[ETHER_MAX_JUMBO_FRAME_LEN];
+
+/**
+ * Dump a packet to PCAP dumper
+ *
+ * @param p
+ *   Handle to sink port
+ * @param mbuf
+ *   Handle to mbuf structure holding the packet
+ */
+static void
+pcap_sink_dump_pkt(struct rte_port_sink *port, struct rte_mbuf *mbuf)
+{
+   uint8_t *pcap_dumper = (uint8_t *)(port->dumper);
+   struct pcap_pkthdr pcap_hdr;
+   uint8_t *pkt;
+
+   /* Maximum num packets already reached */
+   if (port->dump_finish)
+   return;
+
+   pkt = rte_pktmbuf_mtod(mbuf, uint8_t *);
+
+   pcap_hdr.len = mbuf->pkt_len;
+   pcap_hdr.caplen = pcap_hdr.len;
+   gettimeofday(&(pcap_hdr.ts), NULL);
+
+   if (mbuf->nb_segs > 1) {
+   struct rte_mbuf *jumbo_mbuf;
+   uint32_t pkt_index = 0;
+
+   /* if packet size longer than ETHER_MAX_JUMBO_FRAME_LEN,
+* ignore it.
+*/
+   if (mbuf->pkt_len > ETHER_MAX_JUMBO_FRAME_LEN)
+   return;
+
+   for (jumbo_mbuf = mbuf; jumbo_mbuf != NULL;
+   jumbo_mbuf = jumbo_mbuf->next) {
+   rte_memcpy(_pkt_buf[pkt_index],
+   rte_pktmbuf_mtod(jumbo_mbuf, uint8_t *),
+   jumbo_mbuf->data_len);
+   pkt_index += jumbo_mbuf->data_len;
+   }
+
+   jumbo_pkt_buf[pkt_index] = '\0';
+
+   pkt = jumbo_pkt_buf;
+   }
+
+   pcap_dump(pcap_dumper, _hdr, pkt);
+
+   port->pkt_index++;
+
+   if ((port->max_pkts != 0) && (port->pkt_index >= port->max_pkts)) {
+   port->dump_finish = 1;
+   RTE_LOG(INFO, PORT, "Dumped %u packets to file\n",
+   port->pkt_index);
+   }
+
+}
+
+/**
+ * Flush pcap dumper
+ *
+ * @param dumper
+ *   Handle to pcap dumper
+ */
+
+static void
+pcap_sink_flush_pkt(void *dumper)
+{
+   pcap_dumper_t *pcap_dumper = (pcap_dumper_t *)dumper;
+
+   pcap_dump_flush(pcap_dumper);
+}
+
+/**
+ * Close a PCAP dumper handle
+ *
+ * @param dumper
+ *   Handle to pcap dumper
+ */
+static void
+pcap_sink_close(void *dumper)
+{
+   pcap_dumper_t *pcap_dumper = (pcap_dumper_t *)dumper;
+
+   pcap_dump_close(pcap_dumper);
+}
+
+#else
+
+static int
+pcap_sink_open(struct rte_port_sink *port,
+  

[dpdk-dev] [PATCH v3 2/4] example/ip_pipeline: add PCAP file support

2016-03-09 Thread Fan Zhang
This patch add PCAP file support to ip_pipeline. Input port type SOURCE
now supports loading specific PCAP file and sends the packets in it to
pipeline instance. The packets are then released by SINK output port. This
feature can be applied by specifying parameters in configuration file as
shown below;

[PIPELINE1]
type = PASS-THROUGH
core = 1
pktq_in = SOURCE0 SOURCE1
pktq_out = SINK0 SINK1
pcap_file_rd = /path/to/eth1.PCAP /path/to/eth2.PCAP
pcap_bytes_rd_per_pkt = 0 64

The configuration section "pcap_file_rd" contains full path and name of
the PCAP file to be loaded. If multiple SOURCEs exists, each shall have
its own PCAP file path listed in this section, separated by spaces.
Multiple SOURCE ports may share same PCAP file to be copied.

The configuration section "pcap_bytes_rd_per_pkt" contains integer value
and indicates the maximum number of bytes to be copied from each packet
in the PCAP file. If this value is "0", all packets in the file will be
copied fully; if the packet size is smaller than the assigned value, the
entire packet is copied. Same as "pcap_file_rd", every SOURCE shall have
its own maximum copy byte number.

To enable PCAP support to IP pipeline, the compiler option
CONFIG_RTE_PORT_PCAP must be set to 'y'. It is possible to disable PCAP
support by removing "pcap_file_rd" and "pcap_bytes_rd_per_pkt" lines
from the configuration file.

Signed-off-by: Fan Zhang 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/app.h  |   4 +-
 examples/ip_pipeline/config_parse.c | 119 +++-
 examples/ip_pipeline/init.c |  17 +-
 3 files changed, 137 insertions(+), 3 deletions(-)

diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index f55aef8..0c22f7f 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -149,6 +149,8 @@ struct app_pktq_source_params {
uint32_t parsed;
uint32_t mempool_id; /* Position in the app->mempool_params array */
uint32_t burst;
+   char *file_name; /* Full path of PCAP file to be copied to mbufs */
+   uint32_t n_bytes_per_pkt;
 };

 struct app_pktq_sink_params {
diff --git a/examples/ip_pipeline/config_parse.c 
b/examples/ip_pipeline/config_parse.c
index 4695ac1..291dbfb 100644
--- a/examples/ip_pipeline/config_parse.c
+++ b/examples/ip_pipeline/config_parse.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -181,6 +181,8 @@ struct app_pktq_source_params default_source_params = {
.parsed = 0,
.mempool_id = 0,
.burst = 32,
+   .file_name = NULL,
+   .n_bytes_per_pkt = 0,
 };

 struct app_pktq_sink_params default_sink_params = {
@@ -955,6 +957,85 @@ parse_eal(struct app_params *app,
 }

 static int
+parse_pipeline_pcap_source(struct app_params *app,
+   struct app_pipeline_params *p,
+   const char *file_name, const char *cp_size)
+{
+   const char *next = NULL;
+   char *end;
+   uint32_t i;
+   int parse_file = 0;
+
+   if (file_name && !cp_size) {
+   next = file_name;
+   parse_file = 1; /* parse file path */
+   } else if (cp_size && !file_name) {
+   next = cp_size;
+   parse_file = 0; /* parse copy size */
+   } else
+   return -EINVAL;
+
+   char name[APP_PARAM_NAME_SIZE];
+   size_t name_len;
+
+   if (p->n_pktq_in == 0)
+   return -EINVAL;
+
+   for (i = 0; i < p->n_pktq_in; i++) {
+   if (p->pktq_in[i].type != APP_PKTQ_IN_SOURCE)
+   return -EINVAL;
+   }
+
+   i = 0;
+   while (*next != '\0') {
+   uint32_t id;
+
+   if (i >= p->n_pktq_in)
+   return -EINVAL;
+
+   id = p->pktq_in[i].id;
+
+   end = strchr(next, ' ');
+   if (!end)
+   name_len = strlen(next);
+   else
+   name_len = end - next;
+
+   if (name_len == 0 || name_len == sizeof(name))
+   return -EINVAL;
+
+   strncpy(name, next, name_len);
+   name[name_len] = '\0';
+   next += name_len;
+   if (*next != '\0')
+   next++;
+
+   if (parse_file) {
+   app->source_params[id].file_name = strdup(name);
+   if (app->source_params[id].file_name == NULL)
+   return 

[dpdk-dev] [PATCH v3 1/4] lib/librte_port: add PCAP file support to source port

2016-03-09 Thread Fan Zhang
Originally, source ports in librte_port is an input port used as packet
generator. Similar to Linux kernel /dev/zero character device, it
generates null packets. This patch adds optional PCAP file support to
source port: instead of sending NULL packets, the source port generates
packets copied from a PCAP file. To increase the performance, the packets
in the file are loaded to memory initially, and copied to mbufs in circular
manner. Users can enable or disable this feature by setting
CONFIG_RTE_PORT_PCAP compiler option "y" or "n".

Signed-off-by: Fan Zhang 
Acked-by: Cristian Dumitrescu 
---
 config/common_base |   1 +
 lib/librte_port/Makefile   |  10 +-
 lib/librte_port/rte_port_source_sink.c | 251 -
 lib/librte_port/rte_port_source_sink.h |  13 +-
 mk/rte.app.mk  |   5 +
 5 files changed, 275 insertions(+), 5 deletions(-)

diff --git a/config/common_base b/config/common_base
index c73f71a..3be2f18 100644
--- a/config/common_base
+++ b/config/common_base
@@ -458,6 +458,7 @@ CONFIG_RTE_LIBRTE_REORDER=y
 #
 CONFIG_RTE_LIBRTE_PORT=y
 CONFIG_RTE_PORT_STATS_COLLECT=n
+CONFIG_RTE_PORT_PCAP=n

 #
 # Compile librte_table
diff --git a/lib/librte_port/Makefile b/lib/librte_port/Makefile
index 410053e..0b31c04 100644
--- a/lib/librte_port/Makefile
+++ b/lib/librte_port/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -36,6 +36,14 @@ include $(RTE_SDK)/mk/rte.vars.mk
 #
 LIB = librte_port.a

+ifeq ($(CONFIG_RTE_NEXT_ABI),y)
+
+ifeq ($(CONFIG_RTE_PORT_PCAP),y)
+LDLIBS += -lpcap
+endif
+
+endif
+
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)

diff --git a/lib/librte_port/rte_port_source_sink.c 
b/lib/librte_port/rte_port_source_sink.c
index a06477e..3d4e8d9 100644
--- a/lib/librte_port/rte_port_source_sink.c
+++ b/lib/librte_port/rte_port_source_sink.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -37,6 +37,16 @@
 #include 
 #include 

+#ifdef RTE_NEXT_ABI
+
+#include 
+
+#ifdef RTE_PORT_PCAP
+#include 
+#endif
+
+#endif
+
 #include "rte_port_source_sink.h"

 /*
@@ -60,8 +70,174 @@ struct rte_port_source {
struct rte_port_in_stats stats;

struct rte_mempool *mempool;
+
+#ifdef RTE_NEXT_ABI
+   /* PCAP buffers and indexes */
+   uint8_t **pkts;
+   uint8_t *pkt_buff;
+   uint32_t *pkt_len;
+   uint32_t n_pkts;
+   uint32_t pkt_index;
+#endif
 };

+#ifdef RTE_NEXT_ABI
+
+#ifdef RTE_PORT_PCAP
+
+/**
+ * Load PCAP file, allocate and copy packets in the file to memory
+ *
+ * @param p
+ *   Parameters for source port
+ * @param port
+ *   Handle to source port
+ * @param socket_id
+ *   Socket id where the memory is created
+ * @return
+ *   0 on SUCCESS
+ *   error code otherwise
+ */
+static int
+pcap_source_load(struct rte_port_source_params *p,
+   struct rte_port_source *port,
+   int socket_id)
+{
+   uint32_t status = 0;
+   uint32_t n_pkts = 0;
+   uint32_t i;
+   uint32_t *pkt_len_aligns = NULL;
+   size_t total_buff_len = 0;
+   pcap_t *pcap_handle;
+   char pcap_errbuf[PCAP_ERRBUF_SIZE];
+   uint32_t max_len;
+   struct pcap_pkthdr pcap_hdr;
+   const uint8_t *pkt;
+   uint8_t *buff = NULL;
+   uint32_t pktmbuf_maxlen = (uint32_t)
+   (rte_pktmbuf_data_room_size(port->mempool) -
+   RTE_PKTMBUF_HEADROOM);
+
+   if (p->file_name == NULL)
+   return 0;
+
+   if (p->n_bytes_per_pkt == 0)
+   max_len = pktmbuf_maxlen;
+   else
+   max_len = RTE_MIN(p->n_bytes_per_pkt, pktmbuf_maxlen);
+
+   /* first time open, get packet number */
+   pcap_handle = pcap_open_offline(p->file_name, pcap_errbuf);
+   if (pcap_handle == NULL) {
+   status = -ENOENT;
+   goto error_exit;
+   }
+
+   while ((pkt = pcap_next(pcap_handle, _hdr)) != NULL)
+   n_pkts++;
+
+   pcap_close(pcap_handle);
+
+   port->pkt_len = rte_zmalloc_socket("PCAP",
+   (sizeof(*port->pkt_len) * n_pkts), 0, socket_id);
+   if (port->pkt_len == NULL) {
+   status = -ENOMEM;
+   goto error_exit;
+   }
+
+   pkt_len_aligns = rte_malloc("PCAP",
+   (sizeof(*pkt_len_aligns) * n_pkts), 0);
+   if (pkt_len_aligns == NULL) {
+   status = -ENOMEM;
+   goto error_exit;
+   }
+
+   port->pkts = rte_zmalloc_socket("PCAP",
+   (sizeof(*port->pkts) * 

[dpdk-dev] [PATCH v3 0/4] Add PCAP support to source and sink port

2016-03-09 Thread Fan Zhang
This patchset adds feature to source and sink type port in librte_port
library, and to examples/ip_pipline. Originally, source/sink ports act
as input and output of NULL packets generator. This patchset enables
them read from and write to specific PCAP file, to generate and dump
packets.

v3:
*added RTE_NEXT_ABI macro to source port
*updated to fit ip_pipeline configuration new code style

v2:
*fixed source/sink open function returns
*removed duplicated code
*added clearer error message display on different error messages

Acked-by: Cristian Dumitrescu 

Fan Zhang (4):
  lib/librte_port: add PCAP file support to source port
  example/ip_pipeline: add PCAP file support
  lib/librte_port: add packet dumping to PCAP file support in sink port
  examples/ip_pipeline: add packets dumping to PCAP file support

 config/common_base |   1 +
 examples/ip_pipeline/app.h |   6 +-
 examples/ip_pipeline/config_parse.c| 291 ++-
 examples/ip_pipeline/init.c|  29 +-
 examples/ip_pipeline/pipeline_be.h |   4 +-
 lib/librte_port/Makefile   |  10 +-
 lib/librte_port/rte_port_source_sink.c | 499 -
 lib/librte_port/rte_port_source_sink.h |  24 +-
 mk/rte.app.mk  |   5 +
 9 files changed, 857 insertions(+), 12 deletions(-)

-- 
2.5.0



[dpdk-dev] [PATCH v9 2/2] eal: add function to check if primary proc alive

2016-03-09 Thread Thomas Monjalon
2016-03-09 13:37, Harry van Haaren:
> This patch adds a new function to the EAL API:
> int rte_eal_primary_proc_alive(const char *path);
> 
> The function indicates if a primary process is alive right now.
> This functionality is implemented by testing for a write-
> lock on the config file, and the function tests for a lock.
> 
> The use case for this functionality is that a secondary
> process can wait until a primary process starts by polling
> the function and waiting. When the primary is running, the
> secondary continues to poll to detect if the primary process
> has quit unexpectedly, the secondary process can detect this.
> 
> Signed-off-by: Harry van Haaren 
> Acked-by: Maryam Tahhan 
> ---
>  doc/guides/rel_notes/release_16_04.rst  |  8 
>  lib/librte_eal/bsdapp/eal/Makefile  |  1 +
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>  lib/librte_eal/common/eal_common_proc.c | 61 
> +
>  lib/librte_eal/common/include/rte_eal.h | 20 +++-
>  lib/librte_eal/linuxapp/eal/Makefile|  3 +-
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>  7 files changed, 93 insertions(+), 2 deletions(-)
>  create mode 100644 lib/librte_eal/common/eal_common_proc.c
> 
> diff --git a/doc/guides/rel_notes/release_16_04.rst 
> b/doc/guides/rel_notes/release_16_04.rst
> index 24f15bf..7d5000f 100644
> --- a/doc/guides/rel_notes/release_16_04.rst
> +++ b/doc/guides/rel_notes/release_16_04.rst
> @@ -74,6 +74,14 @@ EAL
>  ~~~
>  
>  
> +* **Added rte_eal_primary_proc_alive() function**
> +
> +  A new function ``rte_eal_primary_proc_alive()`` has been added
> +  to allow the user to detect if a primary process is running.
> +  Use cases for this feature include fault detection, and monitoring
> +  using secondary processes.

It is not in the right section (fixed issues).
Moved and reworded before applying:
* **Added function to check primary process state.**


[dpdk-dev] [PATCH v3 4/4] mempool: add in the RTE_NEXT_ABI for ABI breakages

2016-03-09 Thread Olivier MATZ
Hi David,

On 03/09/2016 12:30 PM, Hunt, David wrote:
> Hi Panu,
> 
> On 3/9/2016 10:46 AM, Panu Matilainen wrote:
>> On 03/09/2016 11:50 AM, David Hunt wrote:
>>> This patch is for those people who want to be easily able to switch
>>> between the new mempool layout and the old. Change the value of
>>> RTE_NEXT_ABI in common_base config file
>>
>> I guess the idea here is to document how to switch between the ABIs
>> but to me this reads as if this patch is supposed to change the value
>> in common_base. Of course there's  no such change included (nor should
>> there be) here, but the description could use some fine-tuning perhaps.
>>
> 
> You're right, I'll clarify the comments. v4 due soon.
> 
>>>
>>> v3: Updated to take re-work of file layouts into consideration
>>>
>>> v2: Kept all the NEXT_ABI defs to this patch so as to make the
>>> previous patches easier to read, and also to imake it clear what
>>> code is necessary to keep ABI compatibility when NEXT_ABI is
>>> disabled.
>>
>> Maybe its just me, but:
>> I can see why NEXT_ABI is in a separate patch for review purposes but
>> for final commit this split doesn't seem right to me. In any case its
>> quite a large change for NEXT_ABI.
>>
> 
> The patch basically re-introduces the old (pre-mempool) code as the
> refactoring of the code would have made the NEXT_ABI additions totally
> unreadable. I think this way is the lesser of two evils.
> 
>> In any case, you should add a deprecation notice for the oncoming ABI
>> break in 16.07.
>>
> 
> Sure, I'll add that in v4.

Sorry, maybe I wasn't very clear in my previous messages. For me, the
NEXT_ABI is not the proper solution because, as Panu stated, it makes
the patch hard to read. My understanding of NEXT_ABI is that it should
only be used if the changes are small enough. Duplicating the code with
a big #ifdef NEXT_ABI is not an option to me either.

So that's why the deprecation notice should be used instead. But in this
case, it means that this patch won't be present in 16.04, but will be
added in 16.07.

Regards,
Olivier


[dpdk-dev] [PATCH 2/6] mempool: add stack (lifo) based external mempool handler

2016-03-09 Thread Olivier MATZ
Hi,

>> Hi David,
>>
>> On 02/16/2016 03:48 PM, David Hunt wrote:
>>> adds a simple stack based mempool handler
>>>
>>> Signed-off-by: David Hunt 
>>> ---
>>>  lib/librte_mempool/Makefile|   2 +-
>>>  lib/librte_mempool/rte_mempool.c   |   4 +-
>>>  lib/librte_mempool/rte_mempool.h   |   1 +
>>>  lib/librte_mempool/rte_mempool_stack.c | 164
>>> +
>>>  4 files changed, 169 insertions(+), 2 deletions(-)  create mode
>>> 100644 lib/librte_mempool/rte_mempool_stack.c
>>>
>>
>> I don't get what is the purpose of this handler. Is it an example or is it
>> something that could be useful for dpdk applications?
>>
> This is actually something that is useful for pipelining apps,
> where the mempool cache doesn't really work - example, where we
> have one core doing rx (and alloc), and another core doing
> Tx (and return). In such a case, the mempool ring simply cycles
> through all the mbufs, resulting in a LLC miss on every mbuf
> allocated when the number of mbufs is large. A stack recycles
> buffers more effectively in this case.
> 

While I agree on the principle, if this is the case the commit should
come with an explanation about when this handler should be used, a
small test report showing the performance numbers and probably an
example app.

Also, I think there is a some room for optimizations, especially I
don't think that the spinlock will scale with many cores.

Regards,
Olivier


[dpdk-dev] [PATCH v7 5/5] app/testpmd: add CLIs for E-tag operation

2016-03-09 Thread Wenzhuo Lu
Add the CLIs to support the E-tag operation.
1, Offloading of E-tag insertion and stripping.
2, Forwarding the E-tag packets to pools based on the GRP and E-CID_base.

Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/cmdline.c  | 424 
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  26 ++
 2 files changed, 450 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 28be8e5..e4fd617 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -502,6 +502,27 @@ static void cmd_help_long_parsed(void *parsed_result,
"set link-down port (port_id)\n"
"   Set link down for a port.\n\n"

+   "E-tag set insertion on port-tag-id (value)"
+   " port (port_id) vf (vf_id)\n"
+   "Enable E-tag insertion for a VF on a port\n\n"
+
+   "E-tag set insertion off port (port_id) vf (vf_id)\n"
+   "Disable E-tag insertion for a VF on a port\n\n"
+
+   "E-tag set stripping (on|off) port (port_id)\n"
+   "Enable/disable E-tag stripping on a port\n\n"
+
+   "E-tag set forwarding (on|off) port (port_id)\n"
+   "Enable/disable E-tag based forwarding"
+   " on a port\n\n"
+
+   "E-tag set filter add e-tag-id (value) dst-pool"
+   " (pool_id) port (port_id)\n"
+   "Add an E-tag forwarding filter on a port\n\n"
+
+   "E-tag set filter del e-tag-id (value) port (port_id)\n"
+   "Delete an E-tag forwarding filter on a port\n\n"
+
, list_pkt_forwarding_modes()
);
}
@@ -9904,6 +9925,403 @@ cmdline_parse_inst_t 
cmd_config_l2_tunnel_en_dis_specific = {
},
 };

+/* E-tag configuration */
+
+/* Common result structure for all E-tag configuration */
+struct cmd_config_e_tag_result {
+   cmdline_fixed_string_t e_tag;
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t insertion;
+   cmdline_fixed_string_t stripping;
+   cmdline_fixed_string_t forwarding;
+   cmdline_fixed_string_t filter;
+   cmdline_fixed_string_t add;
+   cmdline_fixed_string_t del;
+   cmdline_fixed_string_t on;
+   cmdline_fixed_string_t off;
+   cmdline_fixed_string_t on_off;
+   cmdline_fixed_string_t port_tag_id;
+   uint32_t port_tag_id_val;
+   cmdline_fixed_string_t e_tag_id;
+   uint16_t e_tag_id_val;
+   cmdline_fixed_string_t dst_pool;
+   uint8_t dst_pool_val;
+   cmdline_fixed_string_t port;
+   uint8_t port_id;
+   cmdline_fixed_string_t vf;
+   uint8_t vf_id;
+};
+
+/* Common CLI fields for all E-tag configuration */
+cmdline_parse_token_string_t cmd_config_e_tag_e_tag =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+e_tag, "E-tag");
+cmdline_parse_token_string_t cmd_config_e_tag_set =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+set, "set");
+cmdline_parse_token_string_t cmd_config_e_tag_insertion =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+insertion, "insertion");
+cmdline_parse_token_string_t cmd_config_e_tag_stripping =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+stripping, "stripping");
+cmdline_parse_token_string_t cmd_config_e_tag_forwarding =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+forwarding, "forwarding");
+cmdline_parse_token_string_t cmd_config_e_tag_filter =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+filter, "filter");
+cmdline_parse_token_string_t cmd_config_e_tag_add =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+add, "add");
+cmdline_parse_token_string_t cmd_config_e_tag_del =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+del, "del");
+cmdline_parse_token_string_t cmd_config_e_tag_on =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+on, "on");
+cmdline_parse_token_string_t cmd_config_e_tag_off =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+off, "off");
+cmdline_parse_token_string_t cmd_config_e_tag_on_off =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+on_off, "on#off");
+cmdline_parse_token_string_t cmd_config_e_tag_port_tag_id =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+port_tag_id, "port-tag-id");

  1   2   3   >