On 12 May 2015 at 15:30, Alexandru Badicioiu <alexandru.badici...@linaro.org
> wrote:

> In my understanding, releasing the atomic context implies _only_ that the
> scheduler is free to return an event from the same atomic queue as the
> previous event. "Conversion" to ordered should be explicit (or create the
> queue with ATOMIC | ORDERED params) and currently is not defined.
> However, there is some confusion related to what an atomic context is. It
> seems there can be only one atomic context per thread at a given moment
> (void argument list), but what about the following sequence:
>
> Thread A
> ev1 = odp_schedule(NULL, SCHED_WAIT); -------> return event from atomic
> queue Q1
> ev2 = odp_schedule(NULL, SCHED_WAIT); -------> return event from atomic
> queue Q2
> odp_release_atomic_context() ----> which context ?
>
I don't think the use case with multiple outstanding events per thread has
been considered. But I think this is interesting and should be supported. A
thread can then manually pipeline processing of packets, e.g. dequeue pkt
N+1 and prefetch relevant data while actually processing pkt N. The
alternative is for ODP and the scheduler to do this but that will require
perfect flow classification (for millions of flows) and 1-1 mapping to
queues (for million of queues) where each queue would then be associated
with the relevant context (well we have that piece). Is this realistic?

Is this use case supported by current networking SoC's?

With DPDK, the application normally dequeues multiple packets from a ring
so can implement this manual SW pipelining if it wishes.

The odp_release_atomic_context() call should probably take an event as a
parameter. (e.g. ev1 or ev2 in your example above) The function must be
called before the event has been re-enqueued or freed.

-- Ola



> Alex
>
> On 12 May 2015 at 15:36, Ola Liljedahl <ola.liljed...@linaro.org> wrote:
>
>> On 12 May 2015 at 14:28, Bala Manoharan <bala.manoha...@linaro.org>
>> wrote:
>>
>>> IMO, using atomic variable instead of atomic queues will work for this
>>> Ipsec example use-case as in this case the critical section is required
>>> only for updating the sequence number but in a generic use-case the
>>> atomicity should be protected over "a region of code" which the application
>>> wants to be executed in the ingress-order.
>>>
>> Isn't this the atomic queues/scheduling of ODP (and several SoC's)?
>>
>>
>>> If HWs are capable we should add additional APIs for scheduler based
>>> locking which can be used by application incase the critical section is
>>> small enough that going through scheduler will cause a performance impact.
>>>
>> Isn't there a "release atomic scheduling" function? A CPU processing
>> stage can start out atomic (HW provided mutual exclusion) but convert into
>> "ordered" scheduling when the mutual exclusion is no longer necessary. Or
>> this missing from the ODP API today?
>>
>>
>>> Regards,
>>> Bala
>>>
>>> On 12 May 2015 at 17:31, Ola Liljedahl <ola.liljed...@linaro.org> wrote:
>>>
>>>> Yes the seqno counter is a shared resource and the atomic_fetch_and_add
>>>> will eventually become a bottleneck for per-SA throughput. But one could
>>>> hope that it should scale better than using atomic queues although this
>>>> depends on the actual microarchitecture.
>>>>
>>>> Freescale PPC has "decorated storage", can't it be used for
>>>> atomic_fetch_and_add()?
>>>> ARMv8.1 has support for "far atomics" which are supposed to scale
>>>> better (again depends on the actual implementation).
>>>>
>>>> On 12 May 2015 at 13:47, Alexandru Badicioiu <
>>>> alexandru.badici...@linaro.org> wrote:
>>>>
>>>>> Atomic increment performance gets worse by increasing the number of
>>>>> cores -
>>>>> see
>>>>> https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.2015.01.31a.pdf
>>>>> - chapter 5) for some measurements on a conventional Intel machine.
>>>>> It may be possible for this overhead to become bigger than the one
>>>>> associated with the atomic queue.
>>>>>
>>>>> Alex
>>>>>
>>>>> On 12 May 2015 at 14:20, Ola Liljedahl <ola.liljed...@linaro.org>
>>>>> wrote:
>>>>>
>>>>>> I think it should be OK to use ordered queues instead of atomic
>>>>>> queues, e.g. for sequence number allocation (which will need an atomic
>>>>>> variable to hold the seqno counter). Packet ingress order will be
>>>>>> maintained but might not always correspond to sequence number order. This
>>>>>> is not a problem a the sliding window in the replay protection will take
>>>>>> care of that, the sliding window could be hundreds of entries (sequence
>>>>>> numbers) large (each will only take one bit). Packet ingress order is the
>>>>>> important characteristic that must be maintained. The IPsec sequence 
>>>>>> number
>>>>>> is not used for packet ordering and order restoration, it is only used 
>>>>>> for
>>>>>> replay protection.
>>>>>>
>>>>>> Someone with a platform which supports both ordered and atomic
>>>>>> scheduling could benchmark both designs and see how performance scales 
>>>>>> when
>>>>>> using ordered queues (and that atomic fetch_and_add) for some relevant
>>>>>> traffic patterns.
>>>>>>
>>>>>> On 8 May 2015 at 13:53, Bill Fischofer <bill.fischo...@linaro.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Jerrin,
>>>>>>>
>>>>>>> Can you propose such a set of APIs for further discussion?  This
>>>>>>> would be good to discuss at the Tuesday call.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Bill
>>>>>>>
>>>>>>> On Fri, May 8, 2015 at 12:07 AM, Jacob, Jerin <
>>>>>>> jerin.ja...@caviumnetworks.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I agree with Ola here on preserving the ingress order.
>>>>>>>> However, I have experienced same performance issue as Nikhil
>>>>>>>> pointed out
>>>>>>>> (atomic queues have too much overhead for short critical section)
>>>>>>>>
>>>>>>>> I am not sure about any other HW but Cavium has support for
>>>>>>>> introducing the critical section while maintain the ingress order
>>>>>>>> as a HW scheduler feature.
>>>>>>>>
>>>>>>>> IMO, if such support is available in other HW then
>>>>>>>> odp_schedule_ordered_lock()/unlock()
>>>>>>>> kind of API will solve the performance issue for the need for short
>>>>>>>> critical section in ordered flow.
>>>>>>>>
>>>>>>>> /Jerin.
>>>>>>>>
>>>>>>>>
>>>>>>>> From: lng-odp <lng-odp-boun...@lists.linaro.org> on behalf of Ola
>>>>>>>> Liljedahl <ola.liljed...@linaro.org>
>>>>>>>> Sent: Thursday, May 7, 2015 9:06 PM
>>>>>>>> To: nikhil.agar...@freescale.com
>>>>>>>> Cc: lng-odp@lists.linaro.org
>>>>>>>> Subject: Re: [lng-odp] Query regarding sequence number update in
>>>>>>>> IPSEC application
>>>>>>>>
>>>>>>>>
>>>>>>>> Using atomic queues will preserve the ingress order when allocating
>>>>>>>> and assigning the sequence number. Also you don't need to use an 
>>>>>>>> expensive
>>>>>>>> atomic operation for updating the sequence number as the atomic queue 
>>>>>>>> and
>>>>>>>> scheduling will provide mutual  exclusion.
>>>>>>>>
>>>>>>>>
>>>>>>>> If the packets that require a sequence number came from parallel or
>>>>>>>> ordered queues, there would be no guarantee that the sequence numbers 
>>>>>>>> would
>>>>>>>> be allocated in packet (ingress) order. Just using an atomic operation
>>>>>>>> (e.g. fetch_and_add or similar) only  guarantees proper update of the
>>>>>>>> sequence number variable, not any specific ordering.
>>>>>>>>
>>>>>>>>
>>>>>>>> If you are ready to trade absolute "correctness" for performance,
>>>>>>>> you could use ordered or may even parallel (questionable for other 
>>>>>>>> reasons)
>>>>>>>> queues and then allocate the sequence number using an atomic 
>>>>>>>> fetch_and_add.
>>>>>>>> Sometimes packets  egress order will then not match the sequence number
>>>>>>>> order (for a flow/SA). For IPSec, this might affect the replay window 
>>>>>>>> check
>>>>>>>> & update at the receiving end but as the replay protection uses a 
>>>>>>>> sliding
>>>>>>>> window of sequence numbers (to handle misordered packets),  there 
>>>>>>>> might not
>>>>>>>> be any adverse effects in practice. The most important aspect is 
>>>>>>>> probably
>>>>>>>> to preserve original packet order.
>>>>>>>>
>>>>>>>>
>>>>>>>> -- Ola
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6 May 2015 at 11:29,  nikhil.agar...@freescale.com <
>>>>>>>> nikhil.agar...@freescale.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> In IPSEC example application, queues are used to update the
>>>>>>>> sequence number. I was wondering why we have used queues to update 
>>>>>>>> sequence
>>>>>>>> number which will add to scheduling delays and adversely hit the
>>>>>>>> performance throughput. Is there any  specific advantage of using 
>>>>>>>> queues
>>>>>>>> over atomic variables.
>>>>>>>>
>>>>>>>> Thanks in advance
>>>>>>>>  Nikhil
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> lng-odp mailing list
>>>>>>>> lng-odp@lists.linaro.org
>>>>>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> lng-odp mailing list
>>>>>>>> lng-odp@lists.linaro.org
>>>>>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> lng-odp mailing list
>>>>>> lng-odp@lists.linaro.org
>>>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> lng-odp mailing list
>>>> lng-odp@lists.linaro.org
>>>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>
>>>>
>>>
>>
>
_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Reply via email to