On 12 May 2015 at 18:06, Ola Liljedahl <ola.liljed...@linaro.org> wrote: > > On 12 May 2015 at 14:28, Bala Manoharan <bala.manoha...@linaro.org> wrote: >> >> IMO, using atomic variable instead of atomic queues will work for this Ipsec >> example use-case as in this case the critical section is required only for >> updating the sequence number but in a generic use-case the atomicity should >> be protected over "a region of code" which the application wants to be >> executed in the ingress-order. > > Isn't this the atomic queues/scheduling of ODP (and several SoC's)?
Yes. But atomic queue switching is currently only through scheduler and it involves scheduler overhead. > > >> >> If HWs are capable we should add additional APIs for scheduler based locking >> which can be used by application incase the critical section is small enough >> that going through scheduler will cause a performance impact. > > Isn't there a "release atomic scheduling" function? A CPU processing stage > can start out atomic (HW provided mutual exclusion) but convert into > "ordered" scheduling when the mutual exclusion is no longer necessary. Or > this missing from the ODP API today? "release atomic scheduling" only talks from "Atomic --> ordered" scenario, I believe the scenario here dictates "Ordered --> Atomic --> Ordered" sequence. Currently this sequence goes through the scheduler and the intent is to avoid scheduler overhead. Regards, Bala > > >> >> Regards, >> Bala >> >> On 12 May 2015 at 17:31, Ola Liljedahl <ola.liljed...@linaro.org> wrote: >>> >>> Yes the seqno counter is a shared resource and the atomic_fetch_and_add >>> will eventually become a bottleneck for per-SA throughput. But one could >>> hope that it should scale better than using atomic queues although this >>> depends on the actual microarchitecture. >>> >>> Freescale PPC has "decorated storage", can't it be used for >>> atomic_fetch_and_add()? >>> ARMv8.1 has support for "far atomics" which are supposed to scale better >>> (again depends on the actual implementation). >>> >>> On 12 May 2015 at 13:47, Alexandru Badicioiu >>> <alexandru.badici...@linaro.org> wrote: >>>> >>>> Atomic increment performance gets worse by increasing the number of cores - >>>> see >>>> https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.2015.01.31a.pdf >>>> - chapter 5) for some measurements on a conventional Intel machine. >>>> It may be possible for this overhead to become bigger than the one >>>> associated with the atomic queue. >>>> >>>> Alex >>>> >>>> On 12 May 2015 at 14:20, Ola Liljedahl <ola.liljed...@linaro.org> wrote: >>>>> >>>>> I think it should be OK to use ordered queues instead of atomic queues, >>>>> e.g. for sequence number allocation (which will need an atomic variable >>>>> to hold the seqno counter). Packet ingress order will be maintained but >>>>> might not always correspond to sequence number order. This is not a >>>>> problem a the sliding window in the replay protection will take care of >>>>> that, the sliding window could be hundreds of entries (sequence numbers) >>>>> large (each will only take one bit). Packet ingress order is the >>>>> important characteristic that must be maintained. The IPsec sequence >>>>> number is not used for packet ordering and order restoration, it is only >>>>> used for replay protection. >>>>> >>>>> Someone with a platform which supports both ordered and atomic scheduling >>>>> could benchmark both designs and see how performance scales when using >>>>> ordered queues (and that atomic fetch_and_add) for some relevant traffic >>>>> patterns. >>>>> >>>>> On 8 May 2015 at 13:53, Bill Fischofer <bill.fischo...@linaro.org> wrote: >>>>>> >>>>>> Jerrin, >>>>>> >>>>>> Can you propose such a set of APIs for further discussion? This would >>>>>> be good to discuss at the Tuesday call. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Bill >>>>>> >>>>>> On Fri, May 8, 2015 at 12:07 AM, Jacob, Jerin >>>>>> <jerin.ja...@caviumnetworks.com> wrote: >>>>>>> >>>>>>> >>>>>>> I agree with Ola here on preserving the ingress order. >>>>>>> However, I have experienced same performance issue as Nikhil pointed out >>>>>>> (atomic queues have too much overhead for short critical section) >>>>>>> >>>>>>> I am not sure about any other HW but Cavium has support for >>>>>>> introducing the critical section while maintain the ingress order as a >>>>>>> HW scheduler feature. >>>>>>> >>>>>>> IMO, if such support is available in other HW then >>>>>>> odp_schedule_ordered_lock()/unlock() >>>>>>> kind of API will solve the performance issue for the need for short >>>>>>> critical section in ordered flow. >>>>>>> >>>>>>> /Jerin. >>>>>>> >>>>>>> >>>>>>> From: lng-odp <lng-odp-boun...@lists.linaro.org> on behalf of Ola >>>>>>> Liljedahl <ola.liljed...@linaro.org> >>>>>>> Sent: Thursday, May 7, 2015 9:06 PM >>>>>>> To: nikhil.agar...@freescale.com >>>>>>> Cc: lng-odp@lists.linaro.org >>>>>>> Subject: Re: [lng-odp] Query regarding sequence number update in IPSEC >>>>>>> application >>>>>>> >>>>>>> >>>>>>> Using atomic queues will preserve the ingress order when allocating and >>>>>>> assigning the sequence number. Also you don't need to use an expensive >>>>>>> atomic operation for updating the sequence number as the atomic queue >>>>>>> and scheduling will provide mutual exclusion. >>>>>>> >>>>>>> >>>>>>> If the packets that require a sequence number came from parallel or >>>>>>> ordered queues, there would be no guarantee that the sequence numbers >>>>>>> would be allocated in packet (ingress) order. Just using an atomic >>>>>>> operation (e.g. fetch_and_add or similar) only guarantees proper >>>>>>> update of the sequence number variable, not any specific ordering. >>>>>>> >>>>>>> >>>>>>> If you are ready to trade absolute "correctness" for performance, you >>>>>>> could use ordered or may even parallel (questionable for other reasons) >>>>>>> queues and then allocate the sequence number using an atomic >>>>>>> fetch_and_add. Sometimes packets egress order will then not match the >>>>>>> sequence number order (for a flow/SA). For IPSec, this might affect the >>>>>>> replay window check & update at the receiving end but as the replay >>>>>>> protection uses a sliding window of sequence numbers (to handle >>>>>>> misordered packets), there might not be any adverse effects in >>>>>>> practice. The most important aspect is probably to preserve original >>>>>>> packet order. >>>>>>> >>>>>>> >>>>>>> -- Ola >>>>>>> >>>>>>> >>>>>>> On 6 May 2015 at 11:29, nikhil.agar...@freescale.com >>>>>>> <nikhil.agar...@freescale.com> wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> In IPSEC example application, queues are used to update the sequence >>>>>>> number. I was wondering why we have used queues to update sequence >>>>>>> number which will add to scheduling delays and adversely hit the >>>>>>> performance throughput. Is there any specific advantage of using >>>>>>> queues over atomic variables. >>>>>>> >>>>>>> Thanks in advance >>>>>>> Nikhil >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> lng-odp mailing list >>>>>>> lng-odp@lists.linaro.org >>>>>>> https://lists.linaro.org/mailman/listinfo/lng-odp >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> lng-odp mailing list >>>>>>> lng-odp@lists.linaro.org >>>>>>> https://lists.linaro.org/mailman/listinfo/lng-odp >>>>>> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> lng-odp mailing list >>>>> lng-odp@lists.linaro.org >>>>> https://lists.linaro.org/mailman/listinfo/lng-odp >>>>> >>>> >>> >>> >>> _______________________________________________ >>> lng-odp mailing list >>> lng-odp@lists.linaro.org >>> https://lists.linaro.org/mailman/listinfo/lng-odp >>> >> > _______________________________________________ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp