[dpdk-dev] Ability to/impact of running with smaller page sizes

2014-07-01 Thread Burakov, Anatoly
Hi Matt,

> I'm curious - is it possible in practical terms to run DPDK without hugepages?

Starting with release 1.7.0, support for VFIO was added, which allows using  
DPDK without hugepages at al (including RX/TX rings) via the --no-huge 
command-line parameter. Bear in mind though that you'll have to have IOMMU/VT-d 
enabled (i.e. no VM support, only host-based) and also have supported kernel 
version (3.6+) as well to use VFIO, the memory size will be limited to 1G, and 
it won't work with multiprocess. I don't have any performance figures on that 
unfortunately.

Best regards,
Anatoly Burakov
DPDK SW Engineer


[dpdk-dev] Ability to/impact of running with smaller page sizes

2014-07-01 Thread Ananyev, Konstantin
Hi,

> Hi Matt,
> 
> On Mon, Jun 30, 2014 at 05:43:39PM -0500, Matt Laswell wrote:
> > Hey Folks,
> >
> > In my application, I'm seeing some design considerations in a project
> > I'm working on that push me towards the use of smaller memory page
> > sizes.  I'm curious - is it possible in practical terms to run DPDK without 
> > hugepages?
> 
> >  If so, does anybody have any practical experience (or a
> > back-of-the-envelop estimate) of how badly such a configuration would
> > hurt performance?  For sake of argument, assume that virtually all of
> > the memory being used is in pre-allocated mempools (e.g lots of
> > rte_mempool_create(), very little rte_malloc().
> >
> 
> There is an case to run DPDK without hugepages in DPDK source codes, which is 
> DPDK supports Xen Dom0. for this, we developed a
> dom0_mm driver.
> Except for Xen Dom0, it is impossible to run DPDK without hugepages without 
> any changes at memory initialization phase, but current
> rte_memzone_reserve_bounded() and
> rte_mempool_xmem_create()  implemention have already support non-hugepage 
> usage in DPDK.
> 

On linux testpmd can run with mempool on 4K pages (though RX/TX HW rings are 
still on hugepages).
To try it add "--mp-anon" to your testpmd command-line.
Also to get a more 'real' picture you can disable MP caching: "--mbcache=0".   

Konstantin


[dpdk-dev] Ability to/impact of running with smaller page sizes

2014-07-01 Thread Liu, Jijiang
Hi Matt,

On Mon, Jun 30, 2014 at 05:43:39PM -0500, Matt Laswell wrote:
> Hey Folks,
> 
> In my application, I'm seeing some design considerations in a project 
> I'm working on that push me towards the use of smaller memory page 
> sizes.  I'm curious - is it possible in practical terms to run DPDK without 
> hugepages?

>  If so, does anybody have any practical experience (or a 
> back-of-the-envelop estimate) of how badly such a configuration would 
> hurt performance?  For sake of argument, assume that virtually all of 
> the memory being used is in pre-allocated mempools (e.g lots of 
> rte_mempool_create(), very little rte_malloc().
> 

There is an case to run DPDK without hugepages in DPDK source codes, which is 
DPDK supports Xen Dom0. for this, we developed a dom0_mm driver. 
Except for Xen Dom0, it is impossible to run DPDK without hugepages without any 
changes at memory initialization phase, but current 
rte_memzone_reserve_bounded() and 
rte_mempool_xmem_create()  implemention have already support non-hugepage usage 
in DPDK.



Thanks,
Jeff


[dpdk-dev] Ability to/impact of running with smaller page sizes

2014-07-01 Thread Matt Laswell
Thanks everybody,

It sounds as though what I'm looking for may be possible, especially with
1.7, but will require some tweaking and there will most definitely be a
performance hit.  That's great information.  This is still just an
experiment for us, and it's not at all guaranteed that I'm going to move
towards smaller pages, but I very much appreciate the insights.

--
Matt Laswell


On Tue, Jul 1, 2014 at 6:51 AM, Burakov, Anatoly 
wrote:

> Hi Matt,
>
> > I'm curious - is it possible in practical terms to run DPDK without
> hugepages?
>
> Starting with release 1.7.0, support for VFIO was added, which allows
> using  DPDK without hugepages at al (including RX/TX rings) via the
> --no-huge command-line parameter. Bear in mind though that you'll have to
> have IOMMU/VT-d enabled (i.e. no VM support, only host-based) and also have
> supported kernel version (3.6+) as well to use VFIO, the memory size will
> be limited to 1G, and it won't work with multiprocess. I don't have any
> performance figures on that unfortunately.
>
> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
>


[dpdk-dev] Ability to/impact of running with smaller page sizes

2014-06-30 Thread Matt Laswell
Hey Folks,

In my application, I'm seeing some design considerations in a project I'm
working on that push me towards the use of smaller memory page sizes.  I'm
curious - is it possible in practical terms to run DPDK without hugepages?
 If so, does anybody have any practical experience (or a
back-of-the-envelop estimate) of how badly such a configuration would hurt
performance?  For sake of argument, assume that virtually all of the memory
being used is in pre-allocated mempools (e.g lots of rte_mempool_create(),
very little rte_malloc().

Thanks in advance for your help.

-- 
Matt Laswell


[dpdk-dev] Ability to/impact of running with smaller page sizes

2014-06-30 Thread Jeff Shaw
Hi Matt,

On Mon, Jun 30, 2014 at 05:43:39PM -0500, Matt Laswell wrote:
> Hey Folks,
> 
> In my application, I'm seeing some design considerations in a project I'm
> working on that push me towards the use of smaller memory page sizes.  I'm
> curious - is it possible in practical terms to run DPDK without hugepages?

Yes, but I do not believe an implementation exists.

>  If so, does anybody have any practical experience (or a
> back-of-the-envelop estimate) of how badly such a configuration would hurt
> performance?  For sake of argument, assume that virtually all of the memory
> being used is in pre-allocated mempools (e.g lots of rte_mempool_create(),
> very little rte_malloc().
> 

It is possible, though not recommended if you want "good performance", to use
smaller memory page sizes.  Poor performance results from penalties incurred
due to DTLB misses.  Please consider the following example.

An application pre-allocates several thousand buffers to use for packet
reception and transmission using 4KB pages.  Each buffer contains 2KB worth
of data space, or enough to store the typical maximum Ethernet frame size. 
Since the page size is only 4KB, each DTLB entry can cache a maximum of two
packet buffer address translations.  If the first level DTLB has, for
instance, 64 x 4KB entries, you would only be able to cache about 128 address
translations at any given time (+1,024 if you include the second level DTLB). 
With 32 x 2MB entries, each DTLB entry can cache address translations for 32K
packet buffers at any given time.

If you believe that your application performance will be negatively impacted
by latencies incurred due to DTLB misses, it is recommended to take steps
which would maximize the DTLB hit rate.

Of course, you will not know how this impacts performance for you application
unless it is tried under realistic conditions.  If you end up doing so, could
you please update the list?


Thanks,
Jeff