Atomic increment performance gets worse by increasing the number of cores - see https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.2015.01.31a.pdf - chapter 5) for some measurements on a conventional Intel machine. It may be possible for this overhead to become bigger than the one associated with the atomic queue.
Alex On 12 May 2015 at 14:20, Ola Liljedahl <ola.liljed...@linaro.org> wrote: > I think it should be OK to use ordered queues instead of atomic queues, > e.g. for sequence number allocation (which will need an atomic variable to > hold the seqno counter). Packet ingress order will be maintained but might > not always correspond to sequence number order. This is not a problem a the > sliding window in the replay protection will take care of that, the sliding > window could be hundreds of entries (sequence numbers) large (each will > only take one bit). Packet ingress order is the important characteristic > that must be maintained. The IPsec sequence number is not used for packet > ordering and order restoration, it is only used for replay protection. > > Someone with a platform which supports both ordered and atomic scheduling > could benchmark both designs and see how performance scales when using > ordered queues (and that atomic fetch_and_add) for some relevant traffic > patterns. > > On 8 May 2015 at 13:53, Bill Fischofer <bill.fischo...@linaro.org> wrote: > >> Jerrin, >> >> Can you propose such a set of APIs for further discussion? This would be >> good to discuss at the Tuesday call. >> >> Thanks. >> >> Bill >> >> On Fri, May 8, 2015 at 12:07 AM, Jacob, Jerin < >> jerin.ja...@caviumnetworks.com> wrote: >> >>> >>> I agree with Ola here on preserving the ingress order. >>> However, I have experienced same performance issue as Nikhil pointed out >>> (atomic queues have too much overhead for short critical section) >>> >>> I am not sure about any other HW but Cavium has support for >>> introducing the critical section while maintain the ingress order as a >>> HW scheduler feature. >>> >>> IMO, if such support is available in other HW then >>> odp_schedule_ordered_lock()/unlock() >>> kind of API will solve the performance issue for the need for short >>> critical section in ordered flow. >>> >>> /Jerin. >>> >>> >>> From: lng-odp <lng-odp-boun...@lists.linaro.org> on behalf of Ola >>> Liljedahl <ola.liljed...@linaro.org> >>> Sent: Thursday, May 7, 2015 9:06 PM >>> To: nikhil.agar...@freescale.com >>> Cc: lng-odp@lists.linaro.org >>> Subject: Re: [lng-odp] Query regarding sequence number update in IPSEC >>> application >>> >>> >>> Using atomic queues will preserve the ingress order when allocating and >>> assigning the sequence number. Also you don't need to use an expensive >>> atomic operation for updating the sequence number as the atomic queue and >>> scheduling will provide mutual exclusion. >>> >>> >>> If the packets that require a sequence number came from parallel or >>> ordered queues, there would be no guarantee that the sequence numbers would >>> be allocated in packet (ingress) order. Just using an atomic operation >>> (e.g. fetch_and_add or similar) only guarantees proper update of the >>> sequence number variable, not any specific ordering. >>> >>> >>> If you are ready to trade absolute "correctness" for performance, you >>> could use ordered or may even parallel (questionable for other reasons) >>> queues and then allocate the sequence number using an atomic fetch_and_add. >>> Sometimes packets egress order will then not match the sequence number >>> order (for a flow/SA). For IPSec, this might affect the replay window check >>> & update at the receiving end but as the replay protection uses a sliding >>> window of sequence numbers (to handle misordered packets), there might not >>> be any adverse effects in practice. The most important aspect is probably >>> to preserve original packet order. >>> >>> >>> -- Ola >>> >>> >>> On 6 May 2015 at 11:29, nikhil.agar...@freescale.com < >>> nikhil.agar...@freescale.com> wrote: >>> >>> >>> Hi, >>> >>> In IPSEC example application, queues are used to update the sequence >>> number. I was wondering why we have used queues to update sequence number >>> which will add to scheduling delays and adversely hit the performance >>> throughput. Is there any specific advantage of using queues over atomic >>> variables. >>> >>> Thanks in advance >>> Nikhil >>> >>> >>> _______________________________________________ >>> lng-odp mailing list >>> lng-odp@lists.linaro.org >>> https://lists.linaro.org/mailman/listinfo/lng-odp >>> >>> >>> >>> _______________________________________________ >>> lng-odp mailing list >>> lng-odp@lists.linaro.org >>> https://lists.linaro.org/mailman/listinfo/lng-odp >>> >> >> > > _______________________________________________ > lng-odp mailing list > lng-odp@lists.linaro.org > https://lists.linaro.org/mailman/listinfo/lng-odp > >
_______________________________________________ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp