> On Jul 5, 2024, at 5:10 PM, Pavan Nikhilesh Bhagavatula 
> <pbhagavat...@marvell.com> wrote:
> 
>> 04/07/2024 16:55, Stephen Hemminger:
>>> On Thu, 04 Jul 2024 16:14:42 +0200
>>> Thomas Monjalon <tho...@monjalon.net> wrote:
>>> 
>>>>>> Let’s ask Pavan why this flag is used in cn10k driver.
>>>>>> 
>>>>>> From our perspective, WFE is available on all the supported arm
>> platforms in
>>>>>> DPDK.
>>>>>> Therefore, RTE_ARM_USE_WFE should be treated as a flag to choose
>> between
>>>>>> WFE
>>>>>> and non-WFE code paths due to performance reasons rather than as a
>> flag
>>>>>> that indicates
>>>>>> the availability of the instruction on the target CPU.
>>>>>> 
>>>>> 
>>>>> We are using this flag to allow application to choose between WFE and
>> non-WFE code path.
>>>>> The non-WFE path performs slightly better.
>>>> 
>>>> What's the benefit of the WFE path then?
>>> 
>>> WFE saves power at the expense of latency.
>> 
>> Yes maybe there is a misunderstanding.
>> Pavan can you confirm you were saying "throughput is better on non-WFE"?
>> but "power consumption is lower on WFE path"?
>> 
> 
> Yes, throughput is better on non-WFE and power consumption is lower on WFE 
> path.
> 
> But the statement cant be generalized for all use-cases, it depends on lot of 
> factors.
> So, we use RTE_ARM_USE_WFE to allow applications to decide what they want.
When WFE was enabled in DPDK, it was introduced in spinlock, ticket lock, ring 
etc. We ran the relevant micro-benchmarks and realized that with WFE the 
performance was lower. Hence it was added under a flag to allow the user to 
choose the feature (not as a way to say that the feature is present in the CPU).

IMO, we should not use this flag for PMD power savings. In PMD, use of WFE is 
purely for power savings and not performance. IIRC, there is already code and 
enough configurable parameters available that control when the PMD calls WFE 
(equivalent in other architectures). So, there is no need of a compile time 
flag for this. 

> 
>>> Maybe some form of hybrid approach would work best and could
>>> be always used.
>>> 
>>> For example, many implementations of mutex do a short spin poll
>>> then fall back to a waiting primitive (like futex).
> 
> This is already done across cnxk drivers and common layer I believe.
> 
>> 
> 
> 

Reply via email to