[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2015-01-09 Thread Ananyev, Konstantin


> -Original Message-
> From: Liang, Cunming
> Sent: Friday, January 09, 2015 9:41 AM
> To: Ananyev, Konstantin; Stephen Hemminger; Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> 
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Friday, January 09, 2015 1:06 AM
> > To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> >
> > Hi Steve,
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming
> > > Sent: Tuesday, December 23, 2014 9:52 AM
> > > To: Stephen Hemminger; Richardson, Bruce
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > > > Sent: Tuesday, December 23, 2014 2:29 AM
> > > > To: Richardson, Bruce
> > > > Cc: Liang, Cunming; dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > > On Mon, 22 Dec 2014 09:46:03 +
> > > > Bruce Richardson  wrote:
> > > >
> > > > > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> > > > > > ...
> > > > > > > I'm conflicted on this one. However, I think far more 
> > > > > > > applications would
> > be
> > > > > > > broken
> > > > > > > to start having to use thread_id in place of an lcore_id than 
> > > > > > > would be
> > > > broken
> > > > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > > > I'm actually struggling to come up with a large number of 
> > > > > > > scenarios
> > where
> > > > it's
> > > > > > > important to an app to determine the cpu it's running on, 
> > > > > > > compared to
> > the
> > > > large
> > > > > > > number of cases where you need to have a data-structure per 
> > > > > > > thread.
> > In
> > > > DPDK
> > > > > > > libs
> > > > > > > alone, you see this assumption that lcore_id == thread_id a large
> > number
> > > > of
> > > > > > > times.
> > > > > > >
> > > > > > > Despite the slight logical inconsistency, I think it's better to 
> > > > > > > avoid
> > > > introducing
> > > > > > > a thread-id and continue having lcore_id representing a unique 
> > > > > > > thread.
> > > > > > >
> > > > > > > /Bruce
> > > > > >
> > > > > > Ok, I understand it.
> > > > > > I list the implicit meaning if using lcore_id representing the 
> > > > > > unique thread.
> > > > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the 
> > > > > > logical
> > > > core id.
> > > > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> > unique
> > > > id for thread.
> > > > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to 
> > > > > > be used
> > only
> > > > in CASE 1)
> > > > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no 
> > > > > > matter
> > > > represent a logical core id.
> > > > > >
> > > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 
> > > > > > base on this
> > > > conclusion.
> > > > > >
> > > > > > /Cunming
> > > > >
> > > > > Sorry, I don't like that suggestion either, as having lcore_id values 
> > > > > greater
> > > > > than RTE_MAX_LCORE is terrible, as how will people know how to
> > dimension
> > > > arrays
> > > > > to be indexes by lcore id? Given the choice, if we are not going to 
> > > > > just use
> > > > > lcore_id as a generic thread id, wh

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2015-01-09 Thread Liang, Cunming
> 
> BTW, one more thing: while we are on it  - it is probably a good time to do
> something with our interrupt thread?
> It is a bit strange that we can't use rte_pktmbuf_free() or
> rte_spinlock_recursive_lock() from our own interrupt/alarm handlers
> 
> Konstantin
[Liang, Cunming] I'll think about it.



[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2015-01-09 Thread Liang, Cunming


> -Original Message-
> From: Ananyev, Konstantin
> Sent: Friday, January 09, 2015 1:06 AM
> To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> 
> Hi Steve,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming
> > Sent: Tuesday, December 23, 2014 9:52 AM
> > To: Stephen Hemminger; Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> >
> >
> > > -Original Message-
> > > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > > Sent: Tuesday, December 23, 2014 2:29 AM
> > > To: Richardson, Bruce
> > > Cc: Liang, Cunming; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > On Mon, 22 Dec 2014 09:46:03 +
> > > Bruce Richardson  wrote:
> > >
> > > > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> > > > > ...
> > > > > > I'm conflicted on this one. However, I think far more applications 
> > > > > > would
> be
> > > > > > broken
> > > > > > to start having to use thread_id in place of an lcore_id than would 
> > > > > > be
> > > broken
> > > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > > I'm actually struggling to come up with a large number of scenarios
> where
> > > it's
> > > > > > important to an app to determine the cpu it's running on, compared 
> > > > > > to
> the
> > > large
> > > > > > number of cases where you need to have a data-structure per thread.
> In
> > > DPDK
> > > > > > libs
> > > > > > alone, you see this assumption that lcore_id == thread_id a large
> number
> > > of
> > > > > > times.
> > > > > >
> > > > > > Despite the slight logical inconsistency, I think it's better to 
> > > > > > avoid
> > > introducing
> > > > > > a thread-id and continue having lcore_id representing a unique 
> > > > > > thread.
> > > > > >
> > > > > > /Bruce
> > > > >
> > > > > Ok, I understand it.
> > > > > I list the implicit meaning if using lcore_id representing the unique 
> > > > > thread.
> > > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the 
> > > > > logical
> > > core id.
> > > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> unique
> > > id for thread.
> > > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be 
> > > > > used
> only
> > > in CASE 1)
> > > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no 
> > > > > matter
> > > represent a logical core id.
> > > > >
> > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base 
> > > > > on this
> > > conclusion.
> > > > >
> > > > > /Cunming
> > > >
> > > > Sorry, I don't like that suggestion either, as having lcore_id values 
> > > > greater
> > > > than RTE_MAX_LCORE is terrible, as how will people know how to
> dimension
> > > arrays
> > > > to be indexes by lcore id? Given the choice, if we are not going to 
> > > > just use
> > > > lcore_id as a generic thread id, which is always between 0 and
> > > RTE_MAX_LCORE
> > > > we can look to define a new thread_id variable to hold that. However, it
> should
> > > > have a bounded range.
> > > > From an ease-of-porting perspective, I still think that the simplest 
> > > > option is
> to
> > > > use the existing lcore_id and accept the fact that it's now a thread id 
> > > > rather
> > > > than an actual physical lcore. Question is, is would that cause us lots 
> > > > of
> issues
> > > > in the future?
> > > >
> > > > /Bruce
> > >
> > > The current rte_lcore_id() has different meaning the thread. Your proposal
> will
> > > break code that uses lcore_id to do per

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2015-01-08 Thread Richardson, Bruce
My opinion on this is that the lcore_id is rarely (if ever) used to find the 
actual core a thread is being run on. Instead it is used 99% of the time as a 
unique array index per thread, and therefore that we can keep that usage by 
just assigning a valid lcore_id to any extra threads created. The suggestion to 
get/set affinities on top of that seems a good one to me also.

/Bruce

-Original Message-
From: Ananyev, Konstantin 
Sent: Thursday, January 8, 2015 5:06 PM
To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce
Cc: dev at dpdk.org
Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore


Hi Steve,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming
> Sent: Tuesday, December 23, 2014 9:52 AM
> To: Stephen Hemminger; Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per 
> lcore
> 
> 
> 
> > -Original Message-
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > Sent: Tuesday, December 23, 2014 2:29 AM
> > To: Richardson, Bruce
> > Cc: Liang, Cunming; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per 
> > lcore
> >
> > On Mon, 22 Dec 2014 09:46:03 +
> > Bruce Richardson  wrote:
> >
> > > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> > > > ...
> > > > > I'm conflicted on this one. However, I think far more 
> > > > > applications would be broken to start having to use thread_id 
> > > > > in place of an lcore_id than would be
> > broken
> > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > I'm actually struggling to come up with a large number of 
> > > > > scenarios where
> > it's
> > > > > important to an app to determine the cpu it's running on, 
> > > > > compared to the
> > large
> > > > > number of cases where you need to have a data-structure per 
> > > > > thread. In
> > DPDK
> > > > > libs
> > > > > alone, you see this assumption that lcore_id == thread_id a 
> > > > > large number
> > of
> > > > > times.
> > > > >
> > > > > Despite the slight logical inconsistency, I think it's better 
> > > > > to avoid
> > introducing
> > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > >
> > > > > /Bruce
> > > >
> > > > Ok, I understand it.
> > > > I list the implicit meaning if using lcore_id representing the unique 
> > > > thread.
> > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents 
> > > > the logical
> > core id.
> > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents 
> > > > an unique
> > id for thread.
> > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest 
> > > > to be used only
> > in CASE 1)
> > > > 4). rte_lcore_id() can be used in CASE 2), but the return value 
> > > > no matter
> > represent a logical core id.
> > > >
> > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 
> > > > base on this
> > conclusion.
> > > >
> > > > /Cunming
> > >
> > > Sorry, I don't like that suggestion either, as having lcore_id 
> > > values greater than RTE_MAX_LCORE is terrible, as how will people 
> > > know how to dimension
> > arrays
> > > to be indexes by lcore id? Given the choice, if we are not going 
> > > to just use lcore_id as a generic thread id, which is always 
> > > between 0 and
> > RTE_MAX_LCORE
> > > we can look to define a new thread_id variable to hold that. 
> > > However, it should have a bounded range.
> > > From an ease-of-porting perspective, I still think that the 
> > > simplest option is to use the existing lcore_id and accept the 
> > > fact that it's now a thread id rather than an actual physical 
> > > lcore. Question is, is would that cause us lots of issues in the future?
> > >
> > > /Bruce
> >
> > The current rte_lcore_id() has different meaning the thread. Your 
> > proposal will break code that uses lcore_id to do per-cpu statistics 
> > and the lcore_config code in the samples.
> > q
> [Liang, Cunming] +1.

Few more thoughts on that subject:

Actually one more place i

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2015-01-08 Thread Ananyev, Konstantin

Hi Steve,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming
> Sent: Tuesday, December 23, 2014 9:52 AM
> To: Stephen Hemminger; Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> 
> 
> > -Original Message-
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > Sent: Tuesday, December 23, 2014 2:29 AM
> > To: Richardson, Bruce
> > Cc: Liang, Cunming; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > On Mon, 22 Dec 2014 09:46:03 +
> > Bruce Richardson  wrote:
> >
> > > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> > > > ...
> > > > > I'm conflicted on this one. However, I think far more applications 
> > > > > would be
> > > > > broken
> > > > > to start having to use thread_id in place of an lcore_id than would be
> > broken
> > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > I'm actually struggling to come up with a large number of scenarios 
> > > > > where
> > it's
> > > > > important to an app to determine the cpu it's running on, compared to 
> > > > > the
> > large
> > > > > number of cases where you need to have a data-structure per thread. In
> > DPDK
> > > > > libs
> > > > > alone, you see this assumption that lcore_id == thread_id a large 
> > > > > number
> > of
> > > > > times.
> > > > >
> > > > > Despite the slight logical inconsistency, I think it's better to avoid
> > introducing
> > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > >
> > > > > /Bruce
> > > >
> > > > Ok, I understand it.
> > > > I list the implicit meaning if using lcore_id representing the unique 
> > > > thread.
> > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the 
> > > > logical
> > core id.
> > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an 
> > > > unique
> > id for thread.
> > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be 
> > > > used only
> > in CASE 1)
> > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no 
> > > > matter
> > represent a logical core id.
> > > >
> > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on 
> > > > this
> > conclusion.
> > > >
> > > > /Cunming
> > >
> > > Sorry, I don't like that suggestion either, as having lcore_id values 
> > > greater
> > > than RTE_MAX_LCORE is terrible, as how will people know how to dimension
> > arrays
> > > to be indexes by lcore id? Given the choice, if we are not going to just 
> > > use
> > > lcore_id as a generic thread id, which is always between 0 and
> > RTE_MAX_LCORE
> > > we can look to define a new thread_id variable to hold that. However, it 
> > > should
> > > have a bounded range.
> > > From an ease-of-porting perspective, I still think that the simplest 
> > > option is to
> > > use the existing lcore_id and accept the fact that it's now a thread id 
> > > rather
> > > than an actual physical lcore. Question is, is would that cause us lots 
> > > of issues
> > > in the future?
> > >
> > > /Bruce
> >
> > The current rte_lcore_id() has different meaning the thread. Your proposal 
> > will
> > break code that uses lcore_id to do per-cpu statistics and the lcore_config
> > code in the samples.
> > q
> [Liang, Cunming] +1.

Few more thoughts on that subject:

Actually one more place in the lib, where lcore_id is used (and it should be 
unique):
rte_spinlock_recursive_lock() / rte_spinlock_recursive_trylock().
So if we going to replace lcore_id with thread_id as uniques thread index, then 
these functions
have to be updated too.

About maintaining our own unique thread_id inside shared memory 
(_get_linear_tid()/_put_linear_tid()).
There is one thing that worries me with that approach:
In case of abnormal process termination, TIDs used by that process will remain 
'reserved'
and there is no way to know which TIDs were used by terminated process.
So there could be a situation with DPDK multi

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-23 Thread Liang, Cunming


> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, December 23, 2014 2:29 AM
> To: Richardson, Bruce
> Cc: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> On Mon, 22 Dec 2014 09:46:03 +
> Bruce Richardson  wrote:
> 
> > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> > > ...
> > > > I'm conflicted on this one. However, I think far more applications 
> > > > would be
> > > > broken
> > > > to start having to use thread_id in place of an lcore_id than would be
> broken
> > > > by having the lcore_id no longer actually correspond to a core.
> > > > I'm actually struggling to come up with a large number of scenarios 
> > > > where
> it's
> > > > important to an app to determine the cpu it's running on, compared to 
> > > > the
> large
> > > > number of cases where you need to have a data-structure per thread. In
> DPDK
> > > > libs
> > > > alone, you see this assumption that lcore_id == thread_id a large number
> of
> > > > times.
> > > >
> > > > Despite the slight logical inconsistency, I think it's better to avoid
> introducing
> > > > a thread-id and continue having lcore_id representing a unique thread.
> > > >
> > > > /Bruce
> > >
> > > Ok, I understand it.
> > > I list the implicit meaning if using lcore_id representing the unique 
> > > thread.
> > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> core id.
> > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique
> id for thread.
> > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used 
> > > only
> in CASE 1)
> > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> represent a logical core id.
> > >
> > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on 
> > > this
> conclusion.
> > >
> > > /Cunming
> >
> > Sorry, I don't like that suggestion either, as having lcore_id values 
> > greater
> > than RTE_MAX_LCORE is terrible, as how will people know how to dimension
> arrays
> > to be indexes by lcore id? Given the choice, if we are not going to just use
> > lcore_id as a generic thread id, which is always between 0 and
> RTE_MAX_LCORE
> > we can look to define a new thread_id variable to hold that. However, it 
> > should
> > have a bounded range.
> > From an ease-of-porting perspective, I still think that the simplest option 
> > is to
> > use the existing lcore_id and accept the fact that it's now a thread id 
> > rather
> > than an actual physical lcore. Question is, is would that cause us lots of 
> > issues
> > in the future?
> >
> > /Bruce
> 
> The current rte_lcore_id() has different meaning the thread. Your proposal 
> will
> break code that uses lcore_id to do per-cpu statistics and the lcore_config
> code in the samples.
> q
[Liang, Cunming] +1. 


[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-23 Thread Liang, Cunming


> -Original Message-
> From: Walukiewicz, Miroslaw
> Sent: Monday, December 22, 2014 6:02 PM
> To: Richardson, Bruce; Liang, Cunming
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Monday, December 22, 2014 10:46 AM
> > To: Liang, Cunming
> > Cc: Walukiewicz, Miroslaw; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> > > ...
> > > > I'm conflicted on this one. However, I think far more applications would
> > be
> > > > broken
> > > > to start having to use thread_id in place of an lcore_id than would be
> > broken
> > > > by having the lcore_id no longer actually correspond to a core.
> > > > I'm actually struggling to come up with a large number of scenarios 
> > > > where
> > it's
> > > > important to an app to determine the cpu it's running on, compared to
> > the large
> > > > number of cases where you need to have a data-structure per thread. In
> > DPDK
> > > > libs
> > > > alone, you see this assumption that lcore_id == thread_id a large number
> > of
> > > > times.
> > > >
> > > > Despite the slight logical inconsistency, I think it's better to avoid
> > introducing
> > > > a thread-id and continue having lcore_id representing a unique thread.
> > > >
> > > > /Bruce
> > >
> > > Ok, I understand it.
> > > I list the implicit meaning if using lcore_id representing the unique 
> > > thread.
> > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> > core id.
> > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> > unique id for thread.
> > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used
> > only in CASE 1)
> > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> > represent a logical core id.
> > >
> > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on 
> > > this
> > conclusion.
> > >
> > > /Cunming
> >
> > Sorry, I don't like that suggestion either, as having lcore_id values 
> > greater
> > than RTE_MAX_LCORE is terrible, as how will people know how to dimension
> > arrays
> > to be indexes by lcore id? 
[Liang, Cunming] For dimension array, we shall have RTE_MAX_THREAD_ID.
Lcore id no longer means logical core, so why still use RTE_MAX_LCORE as the 
dimension ?
In my previous mind, I don't expect to change lcore_config. RTE_MAX_LCORE is 
only used to identify the legal id for logical core.
So there's no any change when id < RTE_MAX_LCORE, while id > RTE_MAX_LCORE 
cause fail in lcore API.

>> Given the choice, if we are not going to just use
> > lcore_id as a generic thread id, which is always between 0 and
> > RTE_MAX_LCORE
> > we can look to define a new thread_id variable to hold that. However, it
> > should
> > have a bounded range.
[Liang, Cunming] Agree, if we merge lcore id with linear thread id, anyway we 
require RTE_MAX_THREAD_ID.
> > From an ease-of-porting perspective, I still think that the simplest option 
> > is to
> > use the existing lcore_id and accept the fact that it's now a thread id 
> > rather
> > than an actual physical lcore. 
[Liang, Cunming] Not sure do you means propose to extend lcore_config as a per 
thread context instead of per lcore ?
If accepts the fact lcore_id is now a thread id, how to make decision the 
physical lcore is in core mask or not ?
Question is, is would that cause us lots of issues
> > in the future?
[Liang, Cunming] Personally I don't like this way that lcore id sometimes stand 
for logical core id, sometimes stand for thread id.
The benefit of it looks like avoid trivial change. Actually will change the 
meaning of API and implement.
What I propose linear thread id is new, but we can control and estimate such 
limited change where it happens.
> >
> I would prefer keeping the RTE_MAX_LCORES as Bruce suggests and
> determine the HW core on base of following condition if we really have to know
> this.
> 
> int num_cores_online = count of cores encountered in the core mask provided by
> cmdline parameter
[Liang, Cunming] In this way, if we have core mask 0xf0. num_cores_online will 
be 4.
rte_lcore_id() value for logical core will be 0, 1, 2, 3, which is no longer 
4,5,6,7.
That's probably all right if trying to give up the origin meaning of lcore_id, 
and change to identify a unique thread id.
But I don't think having a dynamic num_cores_online is a good idea.
If in one day, we plan to support lcore hot plug, the num_cores_online will 
change in the fly.
It's bad to get the id which already occupied by some thread.
> 
> Rte_lcore_id() < num_cores_online -> physical core (pthread first started on 
> the
> core)
> 
> Rte_lcore_id() >= num_cores_online -> pthread created by rte_pthread_create
> 
> Mirek
> 
> > /Bruce


[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-23 Thread Bruce Richardson
On Tue, Dec 23, 2014 at 09:19:54AM +, Walukiewicz, Miroslaw wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen
> > Hemminger
> > Sent: Monday, December 22, 2014 7:29 PM
> > To: Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > 
> > On Mon, 22 Dec 2014 09:46:03 +
> > Bruce Richardson  wrote:
> > 
> > > On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> > > > ...
> > > > > I'm conflicted on this one. However, I think far more applications 
> > > > > would
> > be
> > > > > broken
> > > > > to start having to use thread_id in place of an lcore_id than would be
> > broken
> > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > I'm actually struggling to come up with a large number of scenarios
> > where it's
> > > > > important to an app to determine the cpu it's running on, compared to
> > the large
> > > > > number of cases where you need to have a data-structure per thread.
> > In DPDK
> > > > > libs
> > > > > alone, you see this assumption that lcore_id == thread_id a large
> > number of
> > > > > times.
> > > > >
> > > > > Despite the slight logical inconsistency, I think it's better to avoid
> > introducing
> > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > >
> > > > > /Bruce
> > > >
> > > > Ok, I understand it.
> > > > I list the implicit meaning if using lcore_id representing the unique 
> > > > thread.
> > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the 
> > > > logical
> > core id.
> > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> > unique id for thread.
> > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be 
> > > > used
> > only in CASE 1)
> > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no 
> > > > matter
> > represent a logical core id.
> > > >
> > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on 
> > > > this
> > conclusion.
> > > >
> > > > /Cunming
> > >
> > > Sorry, I don't like that suggestion either, as having lcore_id values 
> > > greater
> > > than RTE_MAX_LCORE is terrible, as how will people know how to
> > dimension arrays
> > > to be indexes by lcore id? Given the choice, if we are not going to just 
> > > use
> > > lcore_id as a generic thread id, which is always between 0 and
> > RTE_MAX_LCORE
> > > we can look to define a new thread_id variable to hold that. However, it
> > should
> > > have a bounded range.
> > > From an ease-of-porting perspective, I still think that the simplest 
> > > option is
> > to
> > > use the existing lcore_id and accept the fact that it's now a thread id 
> > > rather
> > > than an actual physical lcore. Question is, is would that cause us lots of
> > issues
> > > in the future?
> > >
> > > /Bruce
> > 
> > The current rte_lcore_id() has different meaning the thread. Your proposal
> > will
> > break code that uses lcore_id to do per-cpu statistics and the lcore_config
> > code in the samples.
> > q
> It depends on application context and how application treats rte_lcore_id() 
> core. When number of the threads will not exceed the number of cores (let's 
> say old-fashioned DPDK application) all stuff like per-cpu statistics will 
> work correctly. 
> 
> When we treat threads on cores as ordinary threads as we introducing the 
> special function rte_pthread_create() - the meaning of rte_lcore_id() changes 
> to indicate 
>  thread number what is correct under new assumptions and new application 
> model.
> 
> I do not  want to limit DPDK design  to only per-cpu application. There is 
> much more application models that could be supported using DPDK. 
> Current per-cpu approach is only a subset of the possible applications.
> 
> Maybe we should indicate something like CONFIG_RTE_PTHREAD_ENABLE to change a 
> meaning of rte_lcore_id() and introducing rte_pthread_create() family. 
> 
> Mirek
> 
>From the discussion it does look to me like we do need a separate thread id
value, separate from core id. Unfortunately that means that many(most?) places 
in libs
and examples where we use lcore_id right now, we probably need to use the new
thread id. :-(

/Bruce


[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-22 Thread Stephen Hemminger
On Mon, 22 Dec 2014 09:46:03 +
Bruce Richardson  wrote:

> On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> > ...
> > > I'm conflicted on this one. However, I think far more applications would 
> > > be
> > > broken
> > > to start having to use thread_id in place of an lcore_id than would be 
> > > broken
> > > by having the lcore_id no longer actually correspond to a core.
> > > I'm actually struggling to come up with a large number of scenarios where 
> > > it's
> > > important to an app to determine the cpu it's running on, compared to the 
> > > large
> > > number of cases where you need to have a data-structure per thread. In 
> > > DPDK
> > > libs
> > > alone, you see this assumption that lcore_id == thread_id a large number 
> > > of
> > > times.
> > > 
> > > Despite the slight logical inconsistency, I think it's better to avoid 
> > > introducing
> > > a thread-id and continue having lcore_id representing a unique thread.
> > > 
> > > /Bruce
> > 
> > Ok, I understand it. 
> > I list the implicit meaning if using lcore_id representing the unique 
> > thread.
> > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical 
> > core id.
> > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique 
> > id for thread.
> > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used 
> > only in CASE 1)
> > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter 
> > represent a logical core id.
> > 
> > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on 
> > this conclusion.
> > 
> > /Cunming
> 
> Sorry, I don't like that suggestion either, as having lcore_id values greater
> than RTE_MAX_LCORE is terrible, as how will people know how to dimension 
> arrays
> to be indexes by lcore id? Given the choice, if we are not going to just use
> lcore_id as a generic thread id, which is always between 0 and RTE_MAX_LCORE
> we can look to define a new thread_id variable to hold that. However, it 
> should
> have a bounded range.
> From an ease-of-porting perspective, I still think that the simplest option 
> is to
> use the existing lcore_id and accept the fact that it's now a thread id rather
> than an actual physical lcore. Question is, is would that cause us lots of 
> issues
> in the future?
> 
> /Bruce

The current rte_lcore_id() has different meaning the thread. Your proposal will
break code that uses lcore_id to do per-cpu statistics and the lcore_config
code in the samples.
q


[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-22 Thread Walukiewicz, Miroslaw
> -Original Message-
> From: Richardson, Bruce
> Sent: Monday, December 22, 2014 10:46 AM
> To: Liang, Cunming
> Cc: Walukiewicz, Miroslaw; dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> > ...
> > > I'm conflicted on this one. However, I think far more applications would
> be
> > > broken
> > > to start having to use thread_id in place of an lcore_id than would be
> broken
> > > by having the lcore_id no longer actually correspond to a core.
> > > I'm actually struggling to come up with a large number of scenarios where
> it's
> > > important to an app to determine the cpu it's running on, compared to
> the large
> > > number of cases where you need to have a data-structure per thread. In
> DPDK
> > > libs
> > > alone, you see this assumption that lcore_id == thread_id a large number
> of
> > > times.
> > >
> > > Despite the slight logical inconsistency, I think it's better to avoid
> introducing
> > > a thread-id and continue having lcore_id representing a unique thread.
> > >
> > > /Bruce
> >
> > Ok, I understand it.
> > I list the implicit meaning if using lcore_id representing the unique 
> > thread.
> > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> core id.
> > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> unique id for thread.
> > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used
> only in CASE 1)
> > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> represent a logical core id.
> >
> > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this
> conclusion.
> >
> > /Cunming
> 
> Sorry, I don't like that suggestion either, as having lcore_id values greater
> than RTE_MAX_LCORE is terrible, as how will people know how to dimension
> arrays
> to be indexes by lcore id? Given the choice, if we are not going to just use
> lcore_id as a generic thread id, which is always between 0 and
> RTE_MAX_LCORE
> we can look to define a new thread_id variable to hold that. However, it
> should
> have a bounded range.
> From an ease-of-porting perspective, I still think that the simplest option 
> is to
> use the existing lcore_id and accept the fact that it's now a thread id rather
> than an actual physical lcore. Question is, is would that cause us lots of 
> issues
> in the future?
> 
I would prefer keeping the RTE_MAX_LCORES as Bruce suggests and 
determine the HW core on base of following condition if we really have to know 
this.

int num_cores_online = count of cores encountered in the core mask provided by 
cmdline parameter

Rte_lcore_id() < num_cores_online -> physical core (pthread first started on 
the core)

Rte_lcore_id() >= num_cores_online -> pthread created by rte_pthread_create

Mirek

> /Bruce


[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-22 Thread Bruce Richardson
On Mon, Dec 22, 2014 at 01:51:27AM +, Liang, Cunming wrote:
> ...
> > I'm conflicted on this one. However, I think far more applications would be
> > broken
> > to start having to use thread_id in place of an lcore_id than would be 
> > broken
> > by having the lcore_id no longer actually correspond to a core.
> > I'm actually struggling to come up with a large number of scenarios where 
> > it's
> > important to an app to determine the cpu it's running on, compared to the 
> > large
> > number of cases where you need to have a data-structure per thread. In DPDK
> > libs
> > alone, you see this assumption that lcore_id == thread_id a large number of
> > times.
> > 
> > Despite the slight logical inconsistency, I think it's better to avoid 
> > introducing
> > a thread-id and continue having lcore_id representing a unique thread.
> > 
> > /Bruce
> 
> Ok, I understand it. 
> I list the implicit meaning if using lcore_id representing the unique thread.
> 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical 
> core id.
> 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique id 
> for thread.
> 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used 
> only in CASE 1)
> 4). rte_lcore_id() can be used in CASE 2), but the return value no matter 
> represent a logical core id.
> 
> If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this 
> conclusion.
> 
> /Cunming

Sorry, I don't like that suggestion either, as having lcore_id values greater
than RTE_MAX_LCORE is terrible, as how will people know how to dimension arrays
to be indexes by lcore id? Given the choice, if we are not going to just use
lcore_id as a generic thread id, which is always between 0 and RTE_MAX_LCORE
we can look to define a new thread_id variable to hold that. However, it should
have a bounded range.
>From an ease-of-porting perspective, I still think that the simplest option is 
>to
use the existing lcore_id and accept the fact that it's now a thread id rather
than an actual physical lcore. Question is, is would that cause us lots of 
issues
in the future?

/Bruce


[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-22 Thread Liang, Cunming
...
> I'm conflicted on this one. However, I think far more applications would be
> broken
> to start having to use thread_id in place of an lcore_id than would be broken
> by having the lcore_id no longer actually correspond to a core.
> I'm actually struggling to come up with a large number of scenarios where it's
> important to an app to determine the cpu it's running on, compared to the 
> large
> number of cases where you need to have a data-structure per thread. In DPDK
> libs
> alone, you see this assumption that lcore_id == thread_id a large number of
> times.
> 
> Despite the slight logical inconsistency, I think it's better to avoid 
> introducing
> a thread-id and continue having lcore_id representing a unique thread.
> 
> /Bruce

Ok, I understand it. 
I list the implicit meaning if using lcore_id representing the unique thread.
1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical core 
id.
2). When lcore_id large equal than RTE_MAX_LCORE, it represents an unique id 
for thread.
3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used only 
in CASE 1)
4). rte_lcore_id() can be used in CASE 2), but the return value no matter 
represent a logical core id.

If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this 
conclusion.

/Cunming


[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-19 Thread Bruce Richardson
On Fri, Dec 19, 2014 at 01:28:47AM +, Liang, Cunming wrote:
> 
> 
> > -Original Message-
> > From: Walukiewicz, Miroslaw
> > Sent: Thursday, December 18, 2014 8:20 PM
> > To: Liang, Cunming; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > 
> > I have another question regarding your patch.
> > 
> >  Could we extend values returned by rte_lcore_id() to set them per thread 
> > (really
> > the DPDK lcore is a pthread but started on specific core) instead of 
> > creating linear
> > thread id.
> [Liang, Cunming] As you said, __lcore_id is already per thread. 
> Per the semantic meaning, it stands for logic cpu id. 
> When multi-thread running on the same lcore, they should get the same value 
> return by rte_lcore_id().
> The same effective like 'schedu_getcpu()', but less using cost.
> > 
> > The patch would be much simpler and will work same way. The only change
> > would be extending rte_lcore_id when rte_pthread_create() is called.
> [Liang, Cunming] I ever think about it which using rte_lcore_id() to get 
> unique id per pthread rather than have a new API.
> But the name lcore actually no longer identify for cpu id. It may impact all 
> existing user application who use the exact meaning of it.
> How do you think ?
> > 

I'm conflicted on this one. However, I think far more applications would be 
broken
to start having to use thread_id in place of an lcore_id than would be broken
by having the lcore_id no longer actually correspond to a core.
I'm actually struggling to come up with a large number of scenarios where it's
important to an app to determine the cpu it's running on, compared to the large
number of cases where you need to have a data-structure per thread. In DPDK libs
alone, you see this assumption that lcore_id == thread_id a large number of 
times.

Despite the slight logical inconsistency, I think it's better to avoid 
introducing
a thread-id and continue having lcore_id representing a unique thread.

/Bruce

> > The value __lcore_id has really an attribute __thread that means it is 
> > valid not
> > only per CPU core but also per thread.
> > 
> > The mempools, timers, statistics would work without any modifications in 
> > that
> > environment.
> > 
> >  I do not see any reason why old legacy DPDK applications would not work in 
> > that
> > model.
> > 
> > Mirek
> > 
> > > -Original Message-
> > > From: Liang, Cunming
> > > Sent: Monday, December 15, 2014 12:53 PM
> > > To: Walukiewicz, Miroslaw; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > Hi Mirek,
> > >
> > > That sounds great.
> > > Looking forward to it.
> > >
> > > -Cunming
> > >
> > > > -Original Message-
> > > > From: Walukiewicz, Miroslaw
> > > > Sent: Monday, December 15, 2014 7:11 PM
> > > > To: Liang, Cunming; dev at dpdk.org
> > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > > Hi Cunming,
> > > >
> > > > The timers could be used by any application/library started as a 
> > > > standard
> > > > pthread.
> > > > Each pthread needs to have assigned some identifier same way as you are
> > > doing
> > > > it for mempools (the rte_linear_thread_id and rte_lcore_id are good
> > > examples)
> > > >
> > > > I made series of patches extending the rte timers API to use with such 
> > > > kind
> > > of
> > > > identifier keeping existing API working also.
> > > >
> > > > I will send it soon.
> > > >
> > > > Mirek
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Liang, Cunming
> > > > > Sent: Friday, December 12, 2014 6:45 AM
> > > > > To: Walukiewicz, Miroslaw; dev at dpdk.org
> > > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per 
> > > > > lcore
> > > > >
> > > > > Thanks Mirek. That's a good point which wasn't mentioned in cover
> > > letter.
> > > > > For 'rte_timer', I only expect it be used within the 'legacy 
> > > > > per-lcore'
> > > pthread.
> > > > > I'm appreciate if you can give me some cases which can't use it to 
> > > > > fit.
> > > > > In case have t

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-19 Thread Liang, Cunming


> -Original Message-
> From: Walukiewicz, Miroslaw
> Sent: Thursday, December 18, 2014 8:20 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> I have another question regarding your patch.
> 
>  Could we extend values returned by rte_lcore_id() to set them per thread 
> (really
> the DPDK lcore is a pthread but started on specific core) instead of creating 
> linear
> thread id.
[Liang, Cunming] As you said, __lcore_id is already per thread. 
Per the semantic meaning, it stands for logic cpu id. 
When multi-thread running on the same lcore, they should get the same value 
return by rte_lcore_id().
The same effective like 'schedu_getcpu()', but less using cost.
> 
> The patch would be much simpler and will work same way. The only change
> would be extending rte_lcore_id when rte_pthread_create() is called.
[Liang, Cunming] I ever think about it which using rte_lcore_id() to get unique 
id per pthread rather than have a new API.
But the name lcore actually no longer identify for cpu id. It may impact all 
existing user application who use the exact meaning of it.
How do you think ?
> 
> The value __lcore_id has really an attribute __thread that means it is valid 
> not
> only per CPU core but also per thread.
> 
> The mempools, timers, statistics would work without any modifications in that
> environment.
> 
>  I do not see any reason why old legacy DPDK applications would not work in 
> that
> model.
> 
> Mirek
> 
> > -Original Message-
> > From: Liang, Cunming
> > Sent: Monday, December 15, 2014 12:53 PM
> > To: Walukiewicz, Miroslaw; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > Hi Mirek,
> >
> > That sounds great.
> > Looking forward to it.
> >
> > -Cunming
> >
> > > -----Original Message-
> > > From: Walukiewicz, Miroslaw
> > > Sent: Monday, December 15, 2014 7:11 PM
> > > To: Liang, Cunming; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > Hi Cunming,
> > >
> > > The timers could be used by any application/library started as a standard
> > > pthread.
> > > Each pthread needs to have assigned some identifier same way as you are
> > doing
> > > it for mempools (the rte_linear_thread_id and rte_lcore_id are good
> > examples)
> > >
> > > I made series of patches extending the rte timers API to use with such 
> > > kind
> > of
> > > identifier keeping existing API working also.
> > >
> > > I will send it soon.
> > >
> > > Mirek
> > >
> > >
> > > > -Original Message-
> > > > From: Liang, Cunming
> > > > Sent: Friday, December 12, 2014 6:45 AM
> > > > To: Walukiewicz, Miroslaw; dev at dpdk.org
> > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > > Thanks Mirek. That's a good point which wasn't mentioned in cover
> > letter.
> > > > For 'rte_timer', I only expect it be used within the 'legacy per-lcore'
> > pthread.
> > > > I'm appreciate if you can give me some cases which can't use it to fit.
> > > > In case have to use 'rte_timer' in multi-pthread, there are some
> > > > prerequisites and limitations.
> > > > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do
> > pthread
> > > > init by rte_pthread_prepare)
> > > > 2. As 'rte_timer' is not preemptable, when using
> > rte_timer_manager/reset in
> > > > multi-pthread, make sure they're not on the same core.
> > > >
> > > > -Cunming
> > > >
> > > > > -Original Message-
> > > > > From: Walukiewicz, Miroslaw
> > > > > Sent: Thursday, December 11, 2014 5:57 PM
> > > > > To: Liang, Cunming; dev at dpdk.org
> > > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per
> > lcore
> > > > >
> > > > > Thank you Cunming for explanation.
> > > > >
> > > > > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> > > > spinlocks.
> > > > >
> > > > > Mirek
> > > > >
> > > > > > -Original Message-
> > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming
> > Liang
> > > > > >

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-18 Thread Olivier MATZ
Hi,

On 12/18/2014 03:32 PM, Bruce Richardson wrote:
> On Thu, Dec 18, 2014 at 12:20:07PM +, Walukiewicz, Miroslaw wrote:
>> I have another question regarding your patch.
>>
>>  Could we extend values returned by rte_lcore_id() to set them per thread 
>> (really the DPDK lcore is a pthread but started on specific core) instead of 
>> creating linear thread id. 
>>
>> The patch would be much simpler and will work same way. The only change 
>> would be extending rte_lcore_id when rte_pthread_create() is called. 
>>
>> The value __lcore_id has really an attribute __thread that means it is valid 
>> not only per CPU core but also per thread.
>>
>> The mempools, timers, statistics would work without any modifications in 
>> that environment.
>>
>>  I do not see any reason why old legacy DPDK applications would not work in 
>> that model. 
>>
>> Mirek
> 
> Definite +1 here. 

One remark though: it looks that the rte_rings (and therefore the
rte_mempools) are designed with the assumption that the execution
units are alone on their cores.

As explained in [1], there is a risk that a pthread is interrupted
by the kernel at a bad moment. Therefore another thread can be
blocked, spinning on a variable to change its value.

The same could also occurs with spinlocks which are not designed
to wakeup another pthread when the lock is held (like pthread_locks).

And finally, having several pthreads per core implies that the
application should be designed with large queues: if a pthread is
not scheduled during 10ms, it represents 100K packets at 10M PPS.

I don't say it's impossible to do it, but I think it's not so
simple :)

Regards,
Olivier

[1] http://dpdk.org/ml/archives/dev/2013-November/000714.html


[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-18 Thread Bruce Richardson
On Thu, Dec 18, 2014 at 12:20:07PM +, Walukiewicz, Miroslaw wrote:
> I have another question regarding your patch.
> 
>  Could we extend values returned by rte_lcore_id() to set them per thread 
> (really the DPDK lcore is a pthread but started on specific core) instead of 
> creating linear thread id. 
> 
> The patch would be much simpler and will work same way. The only change would 
> be extending rte_lcore_id when rte_pthread_create() is called. 
> 
> The value __lcore_id has really an attribute __thread that means it is valid 
> not only per CPU core but also per thread.
> 
> The mempools, timers, statistics would work without any modifications in that 
> environment.
> 
>  I do not see any reason why old legacy DPDK applications would not work in 
> that model. 
> 
> Mirek

Definite +1 here. 

/Bruce

> 
> > -Original Message-
> > From: Liang, Cunming
> > Sent: Monday, December 15, 2014 12:53 PM
> > To: Walukiewicz, Miroslaw; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > 
> > Hi Mirek,
> > 
> > That sounds great.
> > Looking forward to it.
> > 
> > -Cunming
> > 
> > > -Original Message-
> > > From: Walukiewicz, Miroslaw
> > > Sent: Monday, December 15, 2014 7:11 PM
> > > To: Liang, Cunming; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > Hi Cunming,
> > >
> > > The timers could be used by any application/library started as a standard
> > > pthread.
> > > Each pthread needs to have assigned some identifier same way as you are
> > doing
> > > it for mempools (the rte_linear_thread_id and rte_lcore_id are good
> > examples)
> > >
> > > I made series of patches extending the rte timers API to use with such 
> > > kind
> > of
> > > identifier keeping existing API working also.
> > >
> > > I will send it soon.
> > >
> > > Mirek
> > >
> > >
> > > > -Original Message-
> > > > From: Liang, Cunming
> > > > Sent: Friday, December 12, 2014 6:45 AM
> > > > To: Walukiewicz, Miroslaw; dev at dpdk.org
> > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > > Thanks Mirek. That's a good point which wasn't mentioned in cover
> > letter.
> > > > For 'rte_timer', I only expect it be used within the 'legacy per-lcore'
> > pthread.
> > > > I'm appreciate if you can give me some cases which can't use it to fit.
> > > > In case have to use 'rte_timer' in multi-pthread, there are some
> > > > prerequisites and limitations.
> > > > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do
> > pthread
> > > > init by rte_pthread_prepare)
> > > > 2. As 'rte_timer' is not preemptable, when using
> > rte_timer_manager/reset in
> > > > multi-pthread, make sure they're not on the same core.
> > > >
> > > > -Cunming
> > > >
> > > > > -Original Message-
> > > > > From: Walukiewicz, Miroslaw
> > > > > Sent: Thursday, December 11, 2014 5:57 PM
> > > > > To: Liang, Cunming; dev at dpdk.org
> > > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per
> > lcore
> > > > >
> > > > > Thank you Cunming for explanation.
> > > > >
> > > > > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> > > > spinlocks.
> > > > >
> > > > > Mirek
> > > > >
> > > > > > -Original Message-
> > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming
> > Liang
> > > > > > Sent: Thursday, December 11, 2014 3:05 AM
> > > > > > To: dev at dpdk.org
> > > > > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > > > >
> > > > > >
> > > > > > Scope & Usage Scenario
> > > > > > 
> > > > > >
> > > > > > DPDK usually pin pthread per core to avoid task switch overhead. It
> > gains
> > > > > > performance a lot, but it's not efficient in all cases. In some 
> > > > > > cases, it
> > may
> > > > > > too expensive to use the

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-18 Thread Walukiewicz, Miroslaw
I have another question regarding your patch.

 Could we extend values returned by rte_lcore_id() to set them per thread 
(really the DPDK lcore is a pthread but started on specific core) instead of 
creating linear thread id. 

The patch would be much simpler and will work same way. The only change would 
be extending rte_lcore_id when rte_pthread_create() is called. 

The value __lcore_id has really an attribute __thread that means it is valid 
not only per CPU core but also per thread.

The mempools, timers, statistics would work without any modifications in that 
environment.

 I do not see any reason why old legacy DPDK applications would not work in 
that model. 

Mirek

> -Original Message-
> From: Liang, Cunming
> Sent: Monday, December 15, 2014 12:53 PM
> To: Walukiewicz, Miroslaw; dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> Hi Mirek,
> 
> That sounds great.
> Looking forward to it.
> 
> -Cunming
> 
> > -Original Message-
> > From: Walukiewicz, Miroslaw
> > Sent: Monday, December 15, 2014 7:11 PM
> > To: Liang, Cunming; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > Hi Cunming,
> >
> > The timers could be used by any application/library started as a standard
> > pthread.
> > Each pthread needs to have assigned some identifier same way as you are
> doing
> > it for mempools (the rte_linear_thread_id and rte_lcore_id are good
> examples)
> >
> > I made series of patches extending the rte timers API to use with such kind
> of
> > identifier keeping existing API working also.
> >
> > I will send it soon.
> >
> > Mirek
> >
> >
> > > -Original Message-
> > > From: Liang, Cunming
> > > Sent: Friday, December 12, 2014 6:45 AM
> > > To: Walukiewicz, Miroslaw; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > Thanks Mirek. That's a good point which wasn't mentioned in cover
> letter.
> > > For 'rte_timer', I only expect it be used within the 'legacy per-lcore'
> pthread.
> > > I'm appreciate if you can give me some cases which can't use it to fit.
> > > In case have to use 'rte_timer' in multi-pthread, there are some
> > > prerequisites and limitations.
> > > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do
> pthread
> > > init by rte_pthread_prepare)
> > > 2. As 'rte_timer' is not preemptable, when using
> rte_timer_manager/reset in
> > > multi-pthread, make sure they're not on the same core.
> > >
> > > -Cunming
> > >
> > > > -Original Message-
> > > > From: Walukiewicz, Miroslaw
> > > > Sent: Thursday, December 11, 2014 5:57 PM
> > > > To: Liang, Cunming; dev at dpdk.org
> > > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per
> lcore
> > > >
> > > > Thank you Cunming for explanation.
> > > >
> > > > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> > > spinlocks.
> > > >
> > > > Mirek
> > > >
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming
> Liang
> > > > > Sent: Thursday, December 11, 2014 3:05 AM
> > > > > To: dev at dpdk.org
> > > > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > > >
> > > > >
> > > > > Scope & Usage Scenario
> > > > > 
> > > > >
> > > > > DPDK usually pin pthread per core to avoid task switch overhead. It
> gains
> > > > > performance a lot, but it's not efficient in all cases. In some 
> > > > > cases, it
> may
> > > > > too expensive to use the whole core for a lightweight workload. It's a
> > > > > reasonable demand to have multiple threads per core and each
> threads
> > > > > share CPU
> > > > > in an assigned weight.
> > > > >
> > > > > In fact, nothing avoid user to create normal pthread and using cgroup
> to
> > > > > control the CPU share. One of the purpose for the patchset is to clean
> the
> > > > > gaps of using more DPDK libraries in the normal pthread. In addition, 
> > > > > it
> > > > > demonstrates performance gain by proactive 'yield' when doing id

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-18 Thread Stephen Hemminger
On Thu, 18 Dec 2014 12:20:07 +
"Walukiewicz, Miroslaw"  wrote:

>  Could we extend values returned by rte_lcore_id() to set them per thread 
> (really the DPDK lcore is a pthread but started on specific core) instead of 
> creating linear thread id.

The linear thread id is very useful for having per-core statistics tables.
This is done in lots of places to avoid cache thrashing.


[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-15 Thread Liang, Cunming
Hi Mirek,

That sounds great.
Looking forward to it.

-Cunming

> -Original Message-
> From: Walukiewicz, Miroslaw
> Sent: Monday, December 15, 2014 7:11 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> Hi Cunming,
> 
> The timers could be used by any application/library started as a standard
> pthread.
> Each pthread needs to have assigned some identifier same way as you are doing
> it for mempools (the rte_linear_thread_id and rte_lcore_id are good examples)
> 
> I made series of patches extending the rte timers API to use with such kind of
> identifier keeping existing API working also.
> 
> I will send it soon.
> 
> Mirek
> 
> 
> > -Original Message-
> > From: Liang, Cunming
> > Sent: Friday, December 12, 2014 6:45 AM
> > To: Walukiewicz, Miroslaw; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > Thanks Mirek. That's a good point which wasn't mentioned in cover letter.
> > For 'rte_timer', I only expect it be used within the 'legacy per-lcore' 
> > pthread.
> > I'm appreciate if you can give me some cases which can't use it to fit.
> > In case have to use 'rte_timer' in multi-pthread, there are some
> > prerequisites and limitations.
> > 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do 
> > pthread
> > init by rte_pthread_prepare)
> > 2. As 'rte_timer' is not preemptable, when using rte_timer_manager/reset in
> > multi-pthread, make sure they're not on the same core.
> >
> > -Cunming
> >
> > > -----Original Message-
> > > From: Walukiewicz, Miroslaw
> > > Sent: Thursday, December 11, 2014 5:57 PM
> > > To: Liang, Cunming; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > > Thank you Cunming for explanation.
> > >
> > > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> > spinlocks.
> > >
> > > Mirek
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > > > Sent: Thursday, December 11, 2014 3:05 AM
> > > > To: dev at dpdk.org
> > > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > >
> > > > Scope & Usage Scenario
> > > > 
> > > >
> > > > DPDK usually pin pthread per core to avoid task switch overhead. It 
> > > > gains
> > > > performance a lot, but it's not efficient in all cases. In some cases, 
> > > > it may
> > > > too expensive to use the whole core for a lightweight workload. It's a
> > > > reasonable demand to have multiple threads per core and each threads
> > > > share CPU
> > > > in an assigned weight.
> > > >
> > > > In fact, nothing avoid user to create normal pthread and using cgroup to
> > > > control the CPU share. One of the purpose for the patchset is to clean 
> > > > the
> > > > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > > > demonstrates performance gain by proactive 'yield' when doing idle loop
> > > > in packet IO. It also provides several 'rte_pthread_*' APIs to easy 
> > > > life.
> > > >
> > > >
> > > > Changes to DPDK libraries
> > > > ==
> > > >
> > > > Some of DPDK libraries must run in DPDK environment.
> > > >
> > > > # rte_mempool
> > > >
> > > > In rte_mempool doc, it mentions a thread not created by EAL must not
> > use
> > > > mempools. The root cause is it uses a per-lcore cache inside mempool.
> > > > And 'rte_lcore_id()' will not return a correct value.
> > > >
> > > > The patchset changes this a little. The index of mempool cache won't be 
> > > > a
> > > > lcore_id. Instead of it, using a linear number generated by the 
> > > > allocator.
> > > > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > > > during creation. For those normal pthread expecting to use
> > rte_mempool, it
> > > > requires to apply for a linear id explicitly. Now the mempool cache 
> > > > looks
> > like
> > > > a per-thread base. The linear ID actually identify for the li

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-15 Thread Walukiewicz, Miroslaw
Hi Cunming, 

The timers could be used by any application/library started as a standard 
pthread. 
Each pthread needs to have assigned some identifier same way as you are doing 
it for mempools (the rte_linear_thread_id and rte_lcore_id are good examples)

I made series of patches extending the rte timers API to use with such kind of 
identifier keeping existing API working also.

I will send it soon. 

Mirek


> -Original Message-
> From: Liang, Cunming
> Sent: Friday, December 12, 2014 6:45 AM
> To: Walukiewicz, Miroslaw; dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> Thanks Mirek. That's a good point which wasn't mentioned in cover letter.
> For 'rte_timer', I only expect it be used within the 'legacy per-lcore' 
> pthread.
> I'm appreciate if you can give me some cases which can't use it to fit.
> In case have to use 'rte_timer' in multi-pthread, there are some
> prerequisites and limitations.
> 1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do 
> pthread
> init by rte_pthread_prepare)
> 2. As 'rte_timer' is not preemptable, when using rte_timer_manager/reset in
> multi-pthread, make sure they're not on the same core.
> 
> -Cunming
> 
> > -Original Message-
> > From: Walukiewicz, Miroslaw
> > Sent: Thursday, December 11, 2014 5:57 PM
> > To: Liang, Cunming; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> > Thank you Cunming for explanation.
> >
> > What about DPDK timers? They also depend on rte_lcore_id() to avoid
> spinlocks.
> >
> > Mirek
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > > Sent: Thursday, December 11, 2014 3:05 AM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > >
> > > Scope & Usage Scenario
> > > 
> > >
> > > DPDK usually pin pthread per core to avoid task switch overhead. It gains
> > > performance a lot, but it's not efficient in all cases. In some cases, it 
> > > may
> > > too expensive to use the whole core for a lightweight workload. It's a
> > > reasonable demand to have multiple threads per core and each threads
> > > share CPU
> > > in an assigned weight.
> > >
> > > In fact, nothing avoid user to create normal pthread and using cgroup to
> > > control the CPU share. One of the purpose for the patchset is to clean the
> > > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > > demonstrates performance gain by proactive 'yield' when doing idle loop
> > > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> > >
> > >
> > > Changes to DPDK libraries
> > > ==
> > >
> > > Some of DPDK libraries must run in DPDK environment.
> > >
> > > # rte_mempool
> > >
> > > In rte_mempool doc, it mentions a thread not created by EAL must not
> use
> > > mempools. The root cause is it uses a per-lcore cache inside mempool.
> > > And 'rte_lcore_id()' will not return a correct value.
> > >
> > > The patchset changes this a little. The index of mempool cache won't be a
> > > lcore_id. Instead of it, using a linear number generated by the allocator.
> > > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > > during creation. For those normal pthread expecting to use
> rte_mempool, it
> > > requires to apply for a linear id explicitly. Now the mempool cache looks
> like
> > > a per-thread base. The linear ID actually identify for the linear thread 
> > > id.
> > >
> > > However, there's another problem. The rte_mempool is not
> preemptable.
> > > The
> > > problem comes from rte_ring, so talk together in next section.
> > >
> > > # rte_ring
> > >
> > > rte_ring supports multi-producer enqueue and multi-consumer
> dequeue.
> > > But it's
> > > not preemptable. There's conversation talking about this before.
> > > http://dpdk.org/ml/archives/dev/2013-November/000714.html
> > >
> > > Let's say there's two pthreads running on the same core doing enqueue
> on
> > > the
> > > same rte_ring. If the 1st pthread is preempted by the 2nd pthread while
> it
> > > has
> > > already modified the prod.head, the 2nd pthread will spin until the 1

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-12 Thread Liang, Cunming
Thanks Mirek. That's a good point which wasn't mentioned in cover letter.
For 'rte_timer', I only expect it be used within the 'legacy per-lcore' pthread.
I'm appreciate if you can give me some cases which can't use it to fit.
In case have to use 'rte_timer' in multi-pthread, there are some prerequisites 
and limitations.
1. Make sure thread local variable 'lcore_id' is set correctly (e.g. do pthread 
init by rte_pthread_prepare)
2. As 'rte_timer' is not preemptable, when using rte_timer_manager/reset in 
multi-pthread, make sure they're not on the same core.

-Cunming

> -Original Message-
> From: Walukiewicz, Miroslaw
> Sent: Thursday, December 11, 2014 5:57 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> Thank you Cunming for explanation.
> 
> What about DPDK timers? They also depend on rte_lcore_id() to avoid spinlocks.
> 
> Mirek
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > Sent: Thursday, December 11, 2014 3:05 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> >
> > Scope & Usage Scenario
> > 
> >
> > DPDK usually pin pthread per core to avoid task switch overhead. It gains
> > performance a lot, but it's not efficient in all cases. In some cases, it 
> > may
> > too expensive to use the whole core for a lightweight workload. It's a
> > reasonable demand to have multiple threads per core and each threads
> > share CPU
> > in an assigned weight.
> >
> > In fact, nothing avoid user to create normal pthread and using cgroup to
> > control the CPU share. One of the purpose for the patchset is to clean the
> > gaps of using more DPDK libraries in the normal pthread. In addition, it
> > demonstrates performance gain by proactive 'yield' when doing idle loop
> > in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> >
> >
> > Changes to DPDK libraries
> > ==
> >
> > Some of DPDK libraries must run in DPDK environment.
> >
> > # rte_mempool
> >
> > In rte_mempool doc, it mentions a thread not created by EAL must not use
> > mempools. The root cause is it uses a per-lcore cache inside mempool.
> > And 'rte_lcore_id()' will not return a correct value.
> >
> > The patchset changes this a little. The index of mempool cache won't be a
> > lcore_id. Instead of it, using a linear number generated by the allocator.
> > For those legacy EAL per-lcore thread, it apply for an unique linear id
> > during creation. For those normal pthread expecting to use rte_mempool, it
> > requires to apply for a linear id explicitly. Now the mempool cache looks 
> > like
> > a per-thread base. The linear ID actually identify for the linear thread id.
> >
> > However, there's another problem. The rte_mempool is not preemptable.
> > The
> > problem comes from rte_ring, so talk together in next section.
> >
> > # rte_ring
> >
> > rte_ring supports multi-producer enqueue and multi-consumer dequeue.
> > But it's
> > not preemptable. There's conversation talking about this before.
> > http://dpdk.org/ml/archives/dev/2013-November/000714.html
> >
> > Let's say there's two pthreads running on the same core doing enqueue on
> > the
> > same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it
> > has
> > already modified the prod.head, the 2nd pthread will spin until the 1st one
> > scheduled agian. It causes time wasting. In addition, if the 2nd pthread has
> > absolutely higer priority, it's more terrible.
> >
> > But it doesn't means we can't use. Just need to narrow down the situation
> > when
> > it's used by multi-pthread on the same core.
> > - It CAN be used for any single-producer or single-consumer situation.
> > - It MAY be used by multi-producer/consumer pthread whose scheduling
> > policy
> > are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty
> > befor
> > using it.
> > - It MUST not be used by multi-producer/consumer pthread, while some of
> > their
> > scheduling policies is SCHED_FIFO or SCHED_RR.
> >
> >
> > Performance
> > ==
> >
> > It loses performance by introducing task switching. On packet IO 
> > perspective,
> > we can gain some back by improving IO effective rate. When the pthread do
> > idle
> > loop on an empty rx queue, it should proactively yield. We ca

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-11 Thread Cunming Liang

Scope & Usage Scenario 
  

DPDK usually pin pthread per core to avoid task switch overhead. It gains 
performance a lot, but it's not efficient in all cases. In some cases, it may
too expensive to use the whole core for a lightweight workload. It's a 
reasonable demand to have multiple threads per core and each threads share CPU 
in an assigned weight.

In fact, nothing avoid user to create normal pthread and using cgroup to 
control the CPU share. One of the purpose for the patchset is to clean the 
gaps of using more DPDK libraries in the normal pthread. In addition, it 
demonstrates performance gain by proactive 'yield' when doing idle loop 
in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.


Changes to DPDK libraries
==

Some of DPDK libraries must run in DPDK environment.

# rte_mempool

In rte_mempool doc, it mentions a thread not created by EAL must not use
mempools. The root cause is it uses a per-lcore cache inside mempool. 
And 'rte_lcore_id()' will not return a correct value.

The patchset changes this a little. The index of mempool cache won't be a 
lcore_id. Instead of it, using a linear number generated by the allocator.
For those legacy EAL per-lcore thread, it apply for an unique linear id 
during creation. For those normal pthread expecting to use rte_mempool, it
requires to apply for a linear id explicitly. Now the mempool cache looks like
a per-thread base. The linear ID actually identify for the linear thread id.

However, there's another problem. The rte_mempool is not preemptable. The 
problem comes from rte_ring, so talk together in next section.

# rte_ring

rte_ring supports multi-producer enqueue and multi-consumer dequeue. But it's 
not preemptable. There's conversation talking about this before.
http://dpdk.org/ml/archives/dev/2013-November/000714.html

Let's say there's two pthreads running on the same core doing enqueue on the 
same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it has 
already modified the prod.head, the 2nd pthread will spin until the 1st one 
scheduled agian. It causes time wasting. In addition, if the 2nd pthread has 
absolutely higer priority, it's more terrible.

But it doesn't means we can't use. Just need to narrow down the situation when 
it's used by multi-pthread on the same core.
- It CAN be used for any single-producer or single-consumer situation.
- It MAY be used by multi-producer/consumer pthread whose scheduling policy
are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty befor 
using it.
- It MUST not be used by multi-producer/consumer pthread, while some of their
scheduling policies is SCHED_FIFO or SCHED_RR.


Performance
==

It loses performance by introducing task switching. On packet IO perspective,
we can gain some back by improving IO effective rate. When the pthread do idle 
loop on an empty rx queue, it should proactively yield. We can also slow down
rx for a bit while to take more advantage of the bulk receiving in the next 
loop. In practice, increase the rx ring size also helps to improve the overrall
throughput.


Cgroup Control


Here's a simple example, there's four pthread doing packet IO on the same core.
We expect the CPU share rate is 1:1:2:4.
> mkdir /sys/fs/cgroup/cpu/dpdk
> mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> cd /sys/fs/cgroup/cpu/dpdk
> echo 256 > thread0/cpu.shares
> echo 256 > thread1/cpu.shares
> echo 512 > thread2/cpu.shares
> echo 1024 > thread3/cpu.shares


-END-

Any comments are welcome.

Thanks

*** BLURB HERE ***

Cunming Liang (7):
  eal: add linear thread id as pthread-local variable
  mempool: use linear-tid as mempool cache index
  ring: use linear-tid as ring debug stats index
  eal: add simple API for multi-pthread
  testpmd: support multi-pthread mode
  sample: add new sample for multi-pthread
  eal: macro for cpuset w/ or w/o CPU_ALLOC

 app/test-pmd/cmdline.c|  41 +
 app/test-pmd/testpmd.c|  84 -
 app/test-pmd/testpmd.h|   1 +
 config/common_linuxapp|   1 +
 examples/multi-pthread/Makefile   |  57 ++
 examples/multi-pthread/main.c | 232 
 examples/multi-pthread/main.h |  46 +
 lib/librte_eal/common/include/rte_eal.h   |  15 ++
 lib/librte_eal/common/include/rte_lcore.h |  12 ++
 lib/librte_eal/linuxapp/eal/eal_thread.c  | 282 +++---
 lib/librte_mempool/rte_mempool.h  |  22 +--
 lib/librte_ring/rte_ring.h|   6 +-
 12 files changed, 755 insertions(+), 44 deletions(-)
 create mode 100644 examples/multi-pthread/Makefile
 create mode 100644 examples/multi-pthread/main.c
 create mode 100644 examples/multi-pthread/main.h

-- 
1.8.1.4



[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-11 Thread Walukiewicz, Miroslaw
Thank you Cunming for explanation. 

What about DPDK timers? They also depend on rte_lcore_id() to avoid spinlocks. 

Mirek

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> Sent: Thursday, December 11, 2014 3:05 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> 
> Scope & Usage Scenario
> 
> 
> DPDK usually pin pthread per core to avoid task switch overhead. It gains
> performance a lot, but it's not efficient in all cases. In some cases, it may
> too expensive to use the whole core for a lightweight workload. It's a
> reasonable demand to have multiple threads per core and each threads
> share CPU
> in an assigned weight.
> 
> In fact, nothing avoid user to create normal pthread and using cgroup to
> control the CPU share. One of the purpose for the patchset is to clean the
> gaps of using more DPDK libraries in the normal pthread. In addition, it
> demonstrates performance gain by proactive 'yield' when doing idle loop
> in packet IO. It also provides several 'rte_pthread_*' APIs to easy life.
> 
> 
> Changes to DPDK libraries
> ==
> 
> Some of DPDK libraries must run in DPDK environment.
> 
> # rte_mempool
> 
> In rte_mempool doc, it mentions a thread not created by EAL must not use
> mempools. The root cause is it uses a per-lcore cache inside mempool.
> And 'rte_lcore_id()' will not return a correct value.
> 
> The patchset changes this a little. The index of mempool cache won't be a
> lcore_id. Instead of it, using a linear number generated by the allocator.
> For those legacy EAL per-lcore thread, it apply for an unique linear id
> during creation. For those normal pthread expecting to use rte_mempool, it
> requires to apply for a linear id explicitly. Now the mempool cache looks like
> a per-thread base. The linear ID actually identify for the linear thread id.
> 
> However, there's another problem. The rte_mempool is not preemptable.
> The
> problem comes from rte_ring, so talk together in next section.
> 
> # rte_ring
> 
> rte_ring supports multi-producer enqueue and multi-consumer dequeue.
> But it's
> not preemptable. There's conversation talking about this before.
> http://dpdk.org/ml/archives/dev/2013-November/000714.html
> 
> Let's say there's two pthreads running on the same core doing enqueue on
> the
> same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it
> has
> already modified the prod.head, the 2nd pthread will spin until the 1st one
> scheduled agian. It causes time wasting. In addition, if the 2nd pthread has
> absolutely higer priority, it's more terrible.
> 
> But it doesn't means we can't use. Just need to narrow down the situation
> when
> it's used by multi-pthread on the same core.
> - It CAN be used for any single-producer or single-consumer situation.
> - It MAY be used by multi-producer/consumer pthread whose scheduling
> policy
> are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty
> befor
> using it.
> - It MUST not be used by multi-producer/consumer pthread, while some of
> their
> scheduling policies is SCHED_FIFO or SCHED_RR.
> 
> 
> Performance
> ==
> 
> It loses performance by introducing task switching. On packet IO perspective,
> we can gain some back by improving IO effective rate. When the pthread do
> idle
> loop on an empty rx queue, it should proactively yield. We can also slow
> down
> rx for a bit while to take more advantage of the bulk receiving in the next
> loop. In practice, increase the rx ring size also helps to improve the 
> overrall
> throughput.
> 
> 
> Cgroup Control
> 
> 
> Here's a simple example, there's four pthread doing packet IO on the same
> core.
> We expect the CPU share rate is 1:1:2:4.
> > mkdir /sys/fs/cgroup/cpu/dpdk
> > mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> > mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> > mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> > mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> > cd /sys/fs/cgroup/cpu/dpdk
> > echo 256 > thread0/cpu.shares
> > echo 256 > thread1/cpu.shares
> > echo 512 > thread2/cpu.shares
> > echo 1024 > thread3/cpu.shares
> 
> 
> -END-
> 
> Any comments are welcome.
> 
> Thanks
> 
> *** BLURB HERE ***
> 
> Cunming Liang (7):
>   eal: add linear thread id as pthread-local variable
>   mempool: use linear-tid as mempool cache index
>   ring: use linear-tid as ring debug stats index
>   eal: add simple API for multi-pthread
>   testpmd: support multi-pthread mode
>   sample: ad

[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

2014-12-11 Thread Jayakumar, Muthurajan
Steve, 

Great write up.
Nice explanation of 1) per-lcore numbering and 2) Multi-producer/consumer 
enqueue -dequeue.

Thanks,

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Cunming Liang
Sent: Wednesday, December 10, 2014 6:05 PM
To: dev at dpdk.org
Subject: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore


Scope & Usage Scenario
  

DPDK usually pin pthread per core to avoid task switch overhead. It gains 
performance a lot, but it's not efficient in all cases. In some cases, it may 
too expensive to use the whole core for a lightweight workload. It's a 
reasonable demand to have multiple threads per core and each threads share CPU 
in an assigned weight.

In fact, nothing avoid user to create normal pthread and using cgroup to 
control the CPU share. One of the purpose for the patchset is to clean the gaps 
of using more DPDK libraries in the normal pthread. In addition, it 
demonstrates performance gain by proactive 'yield' when doing idle loop in 
packet IO. It also provides several 'rte_pthread_*' APIs to easy life.


Changes to DPDK libraries
==

Some of DPDK libraries must run in DPDK environment.

# rte_mempool

In rte_mempool doc, it mentions a thread not created by EAL must not use 
mempools. The root cause is it uses a per-lcore cache inside mempool. 
And 'rte_lcore_id()' will not return a correct value.

The patchset changes this a little. The index of mempool cache won't be a 
lcore_id. Instead of it, using a linear number generated by the allocator.
For those legacy EAL per-lcore thread, it apply for an unique linear id during 
creation. For those normal pthread expecting to use rte_mempool, it requires to 
apply for a linear id explicitly. Now the mempool cache looks like a per-thread 
base. The linear ID actually identify for the linear thread id.

However, there's another problem. The rte_mempool is not preemptable. The 
problem comes from rte_ring, so talk together in next section.

# rte_ring

rte_ring supports multi-producer enqueue and multi-consumer dequeue. But it's 
not preemptable. There's conversation talking about this before.
http://dpdk.org/ml/archives/dev/2013-November/000714.html

Let's say there's two pthreads running on the same core doing enqueue on the 
same rte_ring. If the 1st pthread is preempted by the 2nd pthread while it has 
already modified the prod.head, the 2nd pthread will spin until the 1st one 
scheduled agian. It causes time wasting. In addition, if the 2nd pthread has 
absolutely higer priority, it's more terrible.

But it doesn't means we can't use. Just need to narrow down the situation when 
it's used by multi-pthread on the same core.
- It CAN be used for any single-producer or single-consumer situation.
- It MAY be used by multi-producer/consumer pthread whose scheduling policy are 
all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty befor using 
it.
- It MUST not be used by multi-producer/consumer pthread, while some of their 
scheduling policies is SCHED_FIFO or SCHED_RR.


Performance
==

It loses performance by introducing task switching. On packet IO perspective, 
we can gain some back by improving IO effective rate. When the pthread do idle 
loop on an empty rx queue, it should proactively yield. We can also slow down 
rx for a bit while to take more advantage of the bulk receiving in the next 
loop. In practice, increase the rx ring size also helps to improve the overrall 
throughput.


Cgroup Control


Here's a simple example, there's four pthread doing packet IO on the same core.
We expect the CPU share rate is 1:1:2:4.
> mkdir /sys/fs/cgroup/cpu/dpdk
> mkdir /sys/fs/cgroup/cpu/dpdk/thread0
> mkdir /sys/fs/cgroup/cpu/dpdk/thread1
> mkdir /sys/fs/cgroup/cpu/dpdk/thread2
> mkdir /sys/fs/cgroup/cpu/dpdk/thread3
> cd /sys/fs/cgroup/cpu/dpdk
> echo 256 > thread0/cpu.shares
> echo 256 > thread1/cpu.shares
> echo 512 > thread2/cpu.shares
> echo 1024 > thread3/cpu.shares


-END-

Any comments are welcome.

Thanks

*** BLURB HERE ***

Cunming Liang (7):
  eal: add linear thread id as pthread-local variable
  mempool: use linear-tid as mempool cache index
  ring: use linear-tid as ring debug stats index
  eal: add simple API for multi-pthread
  testpmd: support multi-pthread mode
  sample: add new sample for multi-pthread
  eal: macro for cpuset w/ or w/o CPU_ALLOC

 app/test-pmd/cmdline.c|  41 +
 app/test-pmd/testpmd.c|  84 -
 app/test-pmd/testpmd.h|   1 +
 config/common_linuxapp|   1 +
 examples/multi-pthread/Makefile   |  57 ++
 examples/multi-pthread/main.c | 232 
 examples/multi-pthread/main.h |  46 +
 lib/librte_eal/common/include/rte_eal.h   |  15 ++
 lib/librte_eal/common/include/rte_lcore.h |  12 ++  
l