On 9/21/2018 7:37 AM, Honnappa Nagarahalli wrote:
>>>>>>>
>>>>>>> @@ -69,5 +89,13 @@ kni_fifo_get(struct rte_kni_fifo *fifo,
>>>>>>> void **data, unsigned num) static inline uint32_t
>>>>>>> kni_fifo_count(struct rte_kni_fifo *fifo) {
>>>>>>> +#ifdef RTE_USE_C11_MEM_MODEL
>>>>>>> + unsigned fifo_write = __atomic_load_n(&fifo->write,
>>>>>>> + __ATOMIC_ACQUIRE);
>>>>>>> + unsigned fifo_read = __atomic_load_n(&fifo->read,
>>>>>>> +
>>>>>>> +__ATOMIC_ACQUIRE);
>>>>>>
>>>>>> Isn't too heavy to have two __ATOMIC_ACQUIREs? a simple
>>>>>> rte_smp_rmb() would be enough here. Right?
>>>>>> or
>>>>>> Do we need __ATOMIC_ACQUIRE for fifo_write case?
>>>>>>
>>>>> We also had some amount of debate internally on this:
>>>>> 1) We do not want to use rte_smp_rmb() as we want to keep the
>>>>> memory
>>>> models separated (for ex: while using C11, use C11 everywhere). It
>>>> is also not sufficient, please see 3) below.
>>>>
>>>> But Nothing technically wrong in using rte_smp_rmb() here in terms
>>>> functionally and code generated by the compiler.
>>>
>>> rte_smp_rmb() generates 'DMB ISHLD'. This works fine, but it is not optimal.
>> 'LDAR' is a better option which is generated when C11 atomics are used.
>>
>> Yes. But which one is optimal 1 x DMB ISHLD vs 2 x LDAR ?
>
> Good point. I am not sure which one is optimal, it needs to be measured. 'DMB
> ISHLD' orders 'all' earlier loads against 'all' later loads and stores.
> 'LDAR' orders the 'specific' load with 'all' later loads and stores.
>
>>
>>>
>>>>
>>>>> 2) This API can get called from writer or reader, so both the
>>>>> loads have to be __ATOMIC_ACQUIRE
>>>>> 3) Other option is to use __ATOMIC_RELAXED. That would allow any
>>>> loads/stores around of this API to get reordered, especially since
>>>> this is an inline function. This would put burden on the application
>>>> to manage the ordering depending on its usage. It will also require
>>>> the application to understand the implementation of this API.
>>>>
>>>> __ATOMIC_RELAXED may be fine too for _count() case as it may not
>>>> very important to get the exact count for the exact very moment,
>>>> Application can retry.
>>>>
>>>> I am in favor of performance effective implementation.
>>>
>>> The requirement on the correctness of the count depends on the usage of
>> this function. I see the following usage:
>>>
>>> In the file kni_net.c, function: kni_net_tx:
>>>
>>> if (kni_fifo_free_count(kni->tx_q) == 0 ||
>>> kni_fifo_count(kni->alloc_q) == 0) {
>>> /**
>>> * If no free entry in tx_q or no entry in alloc_q,
>>> * drops skb and goes out.
>>> */
>>> goto drop;
>>> }
>>>
>>> There is no retry here, the packet is dropped.
>>
>> OK. Then pick an implementation which is an optimal this case.
>> I think, then rte_smp_rmb() makes sense here as
>> a) no #ifdef clutter
>> b) it is optimal compared to 2 x LDAR
>>
> As I understand, one of the principals of using C11 model is to match the
> store releases and load acquires. IMO, combining C11 memory model with
> barrier based functions makes the code unreadable.
> I realized rte_smp_rmb() is required for x86 as well to prevent compiler
> reordering. We can add that in the non-C11 case. This way, we will have clean
> code for both the options (similar to rte_ring).
> So, if 'RTE_USE_C11_MEM_MODEL' is set to 'n', then the 'rte_smp_rmb' would be
> used.
>
> We can look at handling the #ifdef clutter based on Ferruh's feedback.
Hi Honnappa, Jerin,
Sorry for delay, I missed that this is waiting my input.
+1 to remove #ifdef, but I don't think a separate file is required for this,
specially when it will be duplication of same implementation, nothing arch
specific implementation.
+1 Honnappa's suggestion to hide ifdef's behind APIs, plus those APIs can be
reused later...
And +1 to split into two patches, one for fix to current code and one for c11
atomic implementation support.
I have some basic questions on the patch, will send in different thread.
Thanks,
ferruh
>
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Other than that, I prefer to avoid ifdef clutter by introducing
>>>>>> two separate file just like ring C11 implementation.
>>>>>>
>>>>>> I don't have strong opinion on this this part, I let KNI
>>>>>> MAINTAINER to decide on how to accommodate this change.
>>>>>
>>>>> I prefer to change this as well, I am open for suggestions.
>>>>> Introducing two separate files would be too much for this library.
>>>>> A better
>>>> way would be to have something similar to 'smp_store_release'
>>>> provided by the kernel. i.e. create #defines for loads/stores. Hide
>>>> the clutter behind the #defines.
>>>>
>>>> No Strong opinion on this, leaving to KNI Maintainer.
>>> Will wait on this before re-spinning the patch
>>>
>>>>
>>>> This patch needs to split by two,
>>>> a) Fixes for non C11 implementation(i.e new addition to
>>>> rte_smp_wmb())
>>>> b) add support for C11 implementation.
>>> Agree
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> + return (fifo->len + fifo_write - fifo_read) &
>>>>>>> +(fifo->len - 1); #else
>>>>>>> return (fifo->len + fifo->write - fifo->read) &
>>>>>>> (fifo->len
>>>>>>> - 1);
> Requires rte_smp_rmb() for x86 to prevent compiler reordering.
>
>>>>>>> +#endif
>>>>>>> }
>>>>>>> --
>>>>>>> 2.7.4
>>>>>>>