> -----邮件原件----- > 发件人: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> > 发送时间: 2021年1月8日 18:58 > 收件人: Feifei Wang <feifei.wa...@arm.com>; jer...@marvell.com; Harry > van Haaren <harry.van.haa...@intel.com> > 抄送: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; Ruifeng Wang > <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com>; nd > <n...@arm.com> > 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline > test > > Hi Feifei, > > >Hi, Pavan > > > >> -----邮件原件----- > >> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> > >> 发送时间: 2021年1月8日 17:13 > >> 收件人: Feifei Wang <feifei.wa...@arm.com>; jer...@marvell.com; > >Harry > >> van Haaren <harry.van.haa...@intel.com> > >> 抄送: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli > >> <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; Ruifeng Wang > >> <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com> > >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for > >pipeline > >> test > >> > >> Hi Feifei, > >> > >> >Hi, Pavan > >> > > >> > > >> > > >> >> -----邮件原件----- > >> > > >> >> 发件人: Pavan Nikhilesh Bhagavatula > ><mailto:pbhagavat...@marvell.com> > >> > > >> >> 发送时间: 2021年1月5日 17:29 > >> > > >> >> 收件人: Feifei Wang <mailto:feifei.wa...@arm.com>; > >mailto:jer...@marvell.com; > >> >Harry > >> > > >> >> van Haaren <mailto:harry.van.haa...@intel.com> > >> > > >> >> 抄送: mailto:dev@dpdk.org; nd <mailto:n...@arm.com>; Honnappa > >Nagarahalli > >> > > >> >> <mailto:honnappa.nagaraha...@arm.com>; > >mailto:sta...@dpdk.org; Ruifeng Wang > >> > > >> >> <mailto:ruifeng.w...@arm.com>; nd <mailto:n...@arm.com> > >> > > >> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers > >for > >> >pipeline > >> > > >> >> test > >> > > >> >> > >> > > >> >> Hi Feifei, > >> > > >> >> > >> > > >> >> >Hi, Pavan > >> > > >> >> > > >> > > >> >> >Sorry for my late reply and thanks very much for your review. > >> > > >> >> > > >> > > >> >> >> -----Original Message----- > >> > > >> >> >> From: Pavan Nikhilesh Bhagavatula > >> > >><mailto:pbhagavat...@marvell.com<mailto:pbhagavatula@marvell.c > >om>> > >> > > >> >> >> Sent: 2020年12月22日 18:33 > >> > > >> >> >> To: Feifei Wang > >> ><mailto:feifei.wa...@arm.com<mailto:feifei.wa...@arm.com>>; > >> >mailto:jer...@marvell.com<mailto:jer...@marvell.com>; > >> > > >> >> >Harry van > >> > > >> >> >> Haaren > >> > >><mailto:harry.van.haa...@intel.com<mailto:harry.van.haaren@intel. > >com>>; > >> >Pavan Nikhilesh > >> > > >> >> >> > >> > >><pbhagavat...@caviumnetworks.com<mailto:pbhagavatula@cavium > >n > >> >etworks.com>> > >> > > >> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd > >> ><mailto:n...@arm.com<mailto:n...@arm.com>>; Honnappa > >Nagarahalli > >> > > >> >> >> > >> > >><honnappa.nagaraha...@arm.com<mailto:Honnappa.Nagarahalli@ar > >m > >> >.com>>; mailto:sta...@dpdk.org<mailto:sta...@dpdk.org>; Phil > >Yang > >> > > >> >> >> <mailto:phil.y...@arm.com<mailto:phil.y...@arm.com>> > >> > > >> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release > >> >barriers > >> > > >> >> >for > >> > > >> >> >> pipeline test > >> > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> >Add release barriers before updating the processed packets for > >> > > >> >> >worker > >> > > >> >> >> >lcores to ensure the worker lcore has really finished data > >> > > >> >> >> >processing and then it can update the processed packets > >number. > >> > > >> >> >> > > >> > > >> >> >> > >> > > >> >> >> I believe we can live with minor inaccuracies in stats being > >> > > >> >> >> presented > >> > > >> >> >as > >> > > >> >> >> atomics are pretty heavy when scheduler is limited to burst > >> >> >> size as > >> >1. > >> > > >> >> >> > >> > > >> >> >> One option is to move it before a pipeline operation > >> > > >> >> >(pipeline_event_tx, > >> > > >> >> >> pipeline_fwd_event etc.) as they imply implicit release barrier > >> >> >> (as > >> > > >> >> >> all > >> > > >> >> >the > >> > > >> >> >> changes done to the event should be visible to the next core). > >> > > >> >> > > >> > > >> >> >If I understand correctly, your meaning is that move release > >> >> >barriers > >> > > >> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure > >the > >> > > >> >> >event has been processed before the next core begins to tx/fwd. > >For > >> > > >> >> >example: > >> > > >> >> > >> > > >> >> What I meant was event APIs such as `rte_event_enqueue_burst`, > >> > > >> >> `rte_event_eth_tx_adapter_enqueue` > >> > > >> >> act as an implicit release barrier and the API > >> >`rte_event_dequeue_burst` act > >> > > >> >> as an implicit acquire barrier. > >> > > >> > > >> > > >> >> > >> > > >> >> Since, pipeline_* test starts with a dequeue() and ends with an > >> >enqueue() I > >> > > >> >> don’t believe we need barriers in Between. > >> > > >> > > >> > > >> >Sorry for my misunderstanding this. And I agree with you that no > >> >barriers are > >> > > >> >needed between dequeue and enqueue. > >> > > >> > > >> > > >> >Now, let's go back to the beginning. Actually with this patch, our > >> >barrier is mainly > >> > > >> >for the synchronous variable " w->processed_pkts ". As we all know, > >the > >> >event is firstly > >> > > >> >dequeued and then enqueued, after this, the event can be treated as > >the > >> >processed event > >> > > >> >and included in the statistics("w->processed_pkts++"). > >> > > >> > > >> > > >> >Thus, we add a release barrier before " w->processed_pkts++" is to > >> >prevent this operation > >> > > >> >being executed ahead of time. For example: > >> > > >> >dequeue -> w->processed_pkts++ -> enqueue > >> > > >> >This cause that the worker doesn't actually finish this event > >> >processing, but the event is treated > >> > > >> >as the processed one and included in the statistics. > >> > > >> > >> But the current sequence is dequeue-> enqueue-> w- processed_pkts++ > >>and enqueue already acts as an explicit release barrier right? > >> > > > >Sorry maybe I cannot understand how “enqueue” as an explicit release > >barrier. I think of two possibilities: > >1. As you say before, all the changes done to the event should be > >visible to the next core and enqueue is a operation for event, so the > >next core should wait for the event to be enqueued. > >I think this is due to data dependence for the same variable. However, > >‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot > >prevent ‘w->processed_pkts++’ before enqueue. > >And the main core may load updated ‘w->processed_pkts’ but actually the > >event is still being processed. For example: > > > > Time Slot Worker 1 Main core > > 1 dequeue > > 2 w->processed_pkts++ > > 3 load > > w->processed_pkts > > 4 enqueue > > > >2. Some release barriers have been included in enqueue. There is a > >release barrier in rte_ring_enqueue : > >move head -> copy elements to the ring -> release barrier -> update > >tail > >-> w->processed_pkts++ > >However, this barrier cannot prevent ‘w->processed_pkts++’ before > >update tail, and when update_tail has been finished, the enqueue > >process can be seen completed. > > I was talking about case 2 in particular almost all enqueue calls have some > kind of release barrier in place. I do agree w->processed_pkts++ might get > reordered with tail update but since enqueue itself is a ldr + blr I was > hoping > that it wouldn't occur. > > We can continue the discussion once I have some performance data. > Ok, that's great. I think this is a meaningful discussion. Thanks for your effort~.
Best Regards Feifei > Thanks for your patience :) > Pavan. > > > > >> > > >> > > >> > >>________________________________________________________ > _ > >> _ > >> >____________________ > >> > > >> > > >> > > >> >By the way, I have two other questions about pipeline process test > >> >in "test_pipeline_queue". > >> > > >> >1. when do we start counting processed events (w- > >>processed_pkts)? > >> > > >> >For the fwd mode (internal_port = false), when we choose single > >stage, > >> >application increments > >> > > >> >the number events processed after "pipeline_event enqueue". > >> >However, when we choose multiple > >> > > >> >stage, application increments the number events processed before > >> >"pipleline_event_enqueue". > >> > >> We count an event as process when all the stages have completed > >and its > >> Trasnmitted. > >> > >> >So, > >> > > >> >maybe we can unify this. For example of multiple stage: > >> > > >> > > >> > > >> > if (cq_id == last_queue) { > >> > > >> > ev.queue_id = > >> > tx_queue[ev.mbuf->port]; > >> > > >> > > >> >rte_event_eth_tx_adapter_txq_set(ev.mbuf, > >> >0); > >> > > >> > > >> >pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC); > >> > > >> > + pipeline_event_enqueue(dev, > >> > port, &ev); > >> > > >> > w->processed_pkts++; > >> > > >> > } else { > >> > > >> > ev.queue_id++; > >> > > >> > > >> >pipeline_fwd_event(&ev, sched_type_list[cq_id]); > >> > > >> > + pipeline_event_enqueue(dev, > >> > port, &ev); > >> > > >> > } > >> > > >> > > >> > > >> > - pipeline_event_enqueue(dev, port, &ev); > >> > > >> > > >> > >> The above change makes sense. > >> > >Thanks for your review, and I’ll update this change into the next > >version. > >> > > >> >2. Whether "pipeline_event_enqueue" is needed after > >> >"pipeline_event_tx" for tx mode? > >> > > >> >For single_stage_burst_tx mode, after "pipeline_event_tx", the > >worker > >> >has to enqueue again > >> > > >> >due to "pipeline_event_enqueue_burst", so maybe we should jump > >out of > >> >the loop after > >> > > >> >“pipeline_event_tx”, > >> > >> We call enqueue burst to release the events i.e. enqueue events with > >> RTE_EVENT_OP_RELEASE. > >> > >However, > >In case of single event, for ' pipeline_queue_worker_single_stage_tx' > >and ' pipeline_queue_worker_multi_stage_tx', > >after tx, there is no release operation. > > > >> > > >> for example: > >> > > >> > > >> > > >> > if (ev[i].sched_type > >> >== > >> >RTE_SCHED_TYPE_ATOMIC) { > >> > > >> > > >> > pipeline_event_tx(dev, port, &ev[i]); > >> > > >> > > >> > ev[i].op = RTE_EVENT_OP_RELEASE; > >> > > >> > > >> > w->processed_pkts++; > >> > > >> > + continue; > >> > > >> > } else { > >> > > >> > > >> > ev[i].queue_id++; > >> > > >> > > >> > pipeline_fwd_event(&ev[i], > >> > > >> > > >> >RTE_SCHED_TYPE_ATOMIC); > >> > > >> > } > >> > > >> > } > >> > > >> > > >> > > >> > pipeline_event_enqueue_burst(dev, > >> > port, ev, nb_rx); > >> > > >> > > >> > > >> > > >> > > >> >> > >> > > >> >> > > >> > > >> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) { > >> > > >> >> > + > >> >__atomic_thread_fence(__ATOMIC_RELEASE); > >> > > >> >> > pipeline_event_tx(dev, > >> >> > port, &ev); > >> > > >> >> > w->processed_pkts++; > >> > > >> >> > } else { > >> > > >> >> > ev.queue_id++; > >> > > >> >> > + > >> >__atomic_thread_fence(__ATOMIC_RELEASE); > >> > > >> >> > pipeline_fwd_event(&ev, > >> > > >> >> >RTE_SCHED_TYPE_ATOMIC); > >> > > >> >> > > >> >> > pipeline_event_enqueue(dev, port, &ev); > >> > > >> >> > > >> > > >> >> >However, there are two reasons to prevent this: > >> > > >> >> > > >> > > >> >> >First, compare with other tests in app/eventdev, for example, the > >> > > >> >> >eventdev perf test, the wmb is after event operation to ensure > >> > > >> >> >operation has been finished and then w->processed_pkts++. > >> > > >> >> > >> > > >> >> In case of perf_* tests start with a dequeue() and finally ends > >> >> with a > >> > > >> >> mempool_put() should also act as implicit acquire release pairs > >> >making stats > >> > > >> >> consistent? > >> > > >> > > >> > > >> >For perf tests, this consistency refers to that there is a wmb after > >> >mempool_put(). > >> > > >> >Please refer to this link: > >> > > >> >https://urldefense.proofpoint.com/v2/url?u=http- > >> > >>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0m > >O > >> > >>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=z > >g > >> >QHeSDiXWfI1PIIUxXBqMS6E- > >2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh- > >> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e= > >> > > >> > > >> > > >> >> > >> > > >> >> >So, if we move release barriers before tx/fwd, it may cause that > >the > >> > > >> >> >tests of app/eventdev become inconsistent.This may reduce the > >> > > >> >> >maintainability of the code and make it difficult to understand. > >> > > >> >> > > >> > > >> >> >Second, it is a test case, though heavy thread may cause > >> >performance > >> > > >> >> >degradation, it can ensure that the operation process and the > >> >> >test > >> > > >> >> >result are correct. And maybe for a test case, correctness is > >> >> >more > >> > > >> >> >important than performance. > >> > > >> >> > > >> > > >> >> > >> > > >> >> Most of our internal perf test run on 24/48 core combinations and > >> >since > >> > > >> >> Octeontx2 event device driver supports a burst size of 1, it will > >> >> show > >> >up as > >> > > >> >> Huge performance degradation. > >> > > >> > > >> > > >> >For the impact on performance, I do the test using software driver, > >> >following are some test results: > >> > > >> >-------------------------------------------------------------------- > >> >--- > >> >---------------- > >> >--------------------------------------------- > >> > > >> >Architecture: aarch64 > >> > > >> >Nics: ixgbe-82599 > >> > > >> >CPU: Cortex-A72 > >> > > >> >BURST_SIZE: 1 > >> > > >> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- -- > >> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a > >> > > >> >Flow: one flow, 64bits package, TX rate: 1.4Mpps > >> > > >> > > >> > > >> >Without this patch: > >> > > >> >0.954 mpps avg 0.953 mpps > >> > > >> > > >> > > >> >With this patch: > >> > > >> >0.932 mpps avg 0.930 mpps > >> > > >> >-------------------------------------------------------------------- > >> >--- > >> >---------------- > >> >--------------------------------------------- > >> > > >> > > >> > > >> >Based on the result above, there is no significant performance > >> >degradation with this patch. > >> > > >> >This is because the release barrier is only for “w- > >>processed_pkts++”. > >> >It just ensures that the worker core > >> > > >> >increments the number events processed after enqueue, and it > >doesn’t > >> >affect dequeue/enqueue: > >> > > >> > > >> > > >> >dequeue -> enqueue -> release barrier -> w->processed_pkts++ > >> > > >> > >> Here enqueue already acts as an explicit release barrier. > >> > >Please refer above reasons. > > > >> > > >> > > >> >On the other hand, I infer the reason for the slight decrease in > >> >measurement performance is that the release barrier > >> > > >> >prevent “w->processed_pkts++” before that the event has been > >processed > >> >(enqueue). But I think this test result is closer > >> > > >> >to the real performance. > >> > > >> >And sorry for that we have no octentx2 device, so there is no test > >> >result on Octeontx2 event device driver. Would you please > >> > > >> >help us test this patch on octentx2 when you are convenient. Thanks > >> >very much. > >> > > >> > >> I will report the performance numbers on Monday. > >> > > > >That’s great, Thanks very much for your help. > > > >Best Regards > >Feifei > > > >> > > >> > > >> >Best Regards > >> > > >> >Feifei > >> > >> Regards, > >> Pavan. > >> > >> > > >> > > >> > > >> >> > >> > > >> >> >So, due to two reasons above, I'm ambivalent about how we > >should > >> >do in > >> > > >> >> >the next step. > >> > > >> >> > > >> > > >> >> >Best Regards > >> > > >> >> >Feifei > >> > > >> >> > >> > > >> >> Regards, > >> > > >> >> Pavan. > >> > > >> >> > >> > > >> >> > > >> > > >> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue > >worker > >> > > >> >> >> >functions") > >> > > >> >> >> >Cc: > >> > >>mailto:pbhagavat...@marvell.com<mailto:pbhagavat...@marvell.co > >m> > >> > > >> >> >> >Cc: mailto:sta...@dpdk.org<mailto:sta...@dpdk.org> > >> > > >> >> >> > > >> > > >> >> >> >Signed-off-by: Phil Yang > >> ><mailto:phil.y...@arm.com<mailto:phil.y...@arm.com>> > >> > > >> >> >> >Signed-off-by: Feifei Wang > >> ><mailto:feifei.wa...@arm.com<mailto:feifei.wa...@arm.com>> > >> > > >> >> >> >Reviewed-by: Ruifeng Wang > >> ><mailto:ruifeng.w...@arm.com<mailto:ruifeng.w...@arm.com>> > >> > > >> >> >> >--- > >> > > >> >> >> > app/test-eventdev/test_pipeline_queue.c | 64 > >> > > >> >> >> >+++++++++++++++++++++---- > >> > > >> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-) > >> > > >> >> >> > > >> > > >> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c > >b/app/test- > >> > > >> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb > >> > > >> >> >100644 > >> > > >> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c > >> > > >> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c > >> > > >> >> >> >@@ -30,7 +30,13 @@ > >> >pipeline_queue_worker_single_stage_tx(void > >> > > >> >> >> >*arg) > >> > > >> >> >> > > >> > > >> >> >> > if (ev.sched_type == > >> >> >> > RTE_SCHED_TYPE_ATOMIC) { > >> > > >> >> >> > pipeline_event_tx(dev, > >> >> >> > port, &ev); > >> > > >> >> >> >- w->processed_pkts++; > >> > > >> >> >> >+ > >> > > >> >> >> >+ /* release barrier here > >> >> >> >+ ensures stored > >> >operation > >> > > >> >> >> >+ * of the event completes > >> >> >> >+ before the number > >> >of > >> > > >> >> >> >+ * processed pkts is visible > >> >> >> >+ to the main core > >> > > >> >> >> >+ */ > >> > > >> >> >> >+ > >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), > >> >1, > >> > > >> >> >> >+ > >> >> >> >+ __ATOMIC_RELEASE); > >> > > >> >> >> > } else { > >> > > >> >> >> > ev.queue_id++; > >> > > >> >> >> > pipeline_fwd_event(&ev, > >> > > >> >> >> >RTE_SCHED_TYPE_ATOMIC); > >> > > >> >> >> >@@ -59,7 +65,13 @@ > >> > > >> >> >pipeline_queue_worker_single_stage_fwd(void > >> > > >> >> >> >*arg) > >> > > >> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf, > >> >> >> > 0); > >> > > >> >> >> > pipeline_fwd_event(&ev, > >> >> >> > RTE_SCHED_TYPE_ATOMIC); > >> > > >> >> >> > pipeline_event_enqueue(dev, port, &ev); > >> > > >> >> >> >- w->processed_pkts++; > >> > > >> >> >> >+ > >> > > >> >> >> >+ /* release barrier here ensures stored > >> >> >> >+ operation > >> > > >> >> >> >+ * of the event completes before the number > >> >> >> >+ of > >> > > >> >> >> >+ * processed pkts is visible to the main core > >> > > >> >> >> >+ */ > >> > > >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), 1, > >> > > >> >> >> >+ > >> >> >> >+ __ATOMIC_RELEASE); > >> > > >> >> >> > } > >> > > >> >> >> > > >> > > >> >> >> > return 0; > >> > > >> >> >> >@@ -84,7 +96,13 @@ > >> > > >> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg) > >> > > >> >> >> > if (ev[i].sched_type == > >> > > >> >> >> >RTE_SCHED_TYPE_ATOMIC) { > >> > > >> >> >> > > >> >> >> > pipeline_event_tx(dev, port, &ev[i]); > >> > > >> >> >> > ev[i].op = > >> >> >> > RTE_EVENT_OP_RELEASE; > >> > > >> >> >> >- > >> >> >> >w->processed_pkts++; > >> > > >> >> >> >+ > >> > > >> >> >> >+ /* release > >> >> >> >+ barrier here ensures stored > >> > > >> >> >> >operation > >> > > >> >> >> >+ * of the > >> >> >> >+ event completes before the > >> > > >> >> >> >number of > >> > > >> >> >> >+ * processed > >> >> >> >+ pkts is visible to the main > >> > > >> >> >> >core > >> > > >> >> >> >+ */ > >> > > >> >> >> >+ > >> >> >> >+ __atomic_fetch_add(&(w- > >> > > >> >> >> >>processed_pkts), 1, > >> > > >> >> >> >+ > >> >__ATOMIC_RELEASE); > >> > > >> >> >> > } else { > >> > > >> >> >> > > >> >> >> > ev[i].queue_id++; > >> > > >> >> >> > > >> >> >> > pipeline_fwd_event(&ev[i], > >> > > >> >> >> >@@ -121,7 +139,13 @@ > >> > > >> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg) > >> > > >> >> >> > } > >> > > >> >> >> > > >> > > >> >> >> > pipeline_event_enqueue_burst(dev, port, > >> >> >> > ev, nb_rx); > >> > > >> >> >> >- w->processed_pkts += nb_rx; > >> > > >> >> >> >+ > >> > > >> >> >> >+ /* release barrier here ensures stored > >> >> >> >+ operation > >> > > >> >> >> >+ * of the event completes before the number > >> >> >> >+ of > >> > > >> >> >> >+ * processed pkts is visible to the main core > >> > > >> >> >> >+ */ > >> > > >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), > >> >> >> >+ nb_rx, > >> > > >> >> >> >+ > >> >> >> >+ __ATOMIC_RELEASE); > >> > > >> >> >> > } > >> > > >> >> >> > > >> > > >> >> >> > return 0; > >> > > >> >> >> >@@ -146,7 +170,13 @@ > >> > > >> >> >pipeline_queue_worker_multi_stage_tx(void > >> > > >> >> >> >*arg) > >> > > >> >> >> > > >> > > >> >> >> > if (ev.queue_id == > >> >> >> > tx_queue[ev.mbuf->port]) { > >> > > >> >> >> > pipeline_event_tx(dev, > >> >> >> > port, &ev); > >> > > >> >> >> >- w->processed_pkts++; > >> > > >> >> >> >+ > >> > > >> >> >> >+ /* release barrier here > >> >> >> >+ ensures stored > >> >operation > >> > > >> >> >> >+ * of the event completes > >> >> >> >+ before the number > >> >of > >> > > >> >> >> >+ * processed pkts is visible > >> >> >> >+ to the main core > >> > > >> >> >> >+ */ > >> > > >> >> >> >+ > >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), > >> >1, > >> > > >> >> >> >+ > >> >> >> >+ __ATOMIC_RELEASE); > >> > > >> >> >> > continue; > >> > > >> >> >> > } > >> > > >> >> >> > > >> > > >> >> >> >@@ -180,7 +210,13 @@ > >> > > >> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg) > >> > > >> >> >> > ev.queue_id = > >> >> >> > tx_queue[ev.mbuf->port]; > >> > > >> >> >> > > >> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf, > >> >0); > >> > > >> >> >> > pipeline_fwd_event(&ev, > >> > > >> >> >> >RTE_SCHED_TYPE_ATOMIC); > >> > > >> >> >> >- w->processed_pkts++; > >> > > >> >> >> >+ > >> > > >> >> >> >+ /* release barrier here > >> >> >> >+ ensures stored > >> >operation > >> > > >> >> >> >+ * of the event completes > >> >> >> >+ before the number > >> >of > >> > > >> >> >> >+ * processed pkts is visible > >> >> >> >+ to the main core > >> > > >> >> >> >+ */ > >> > > >> >> >> >+ > >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts), > >> >1, > >> > > >> >> >> >+ > >> >> >> >+ __ATOMIC_RELEASE); > >> > > >> >> >> > } else { > >> > > >> >> >> > ev.queue_id++; > >> > > >> >> >> > pipeline_fwd_event(&ev, > >> > > >> >> >> >sched_type_list[cq_id]); > >> > > >> >> >> >@@ -214,7 +250,13 @@ > >> > > >> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg) > >> > > >> >> >> > if (ev[i].queue_id == > >> >> >> > tx_queue[ev[i].mbuf- > >> > > >> >> >> >>port]) { > >> > > >> >> >> > > >> >> >> > pipeline_event_tx(dev, port, &ev[i]); > >> > > >> >> >> > ev[i].op = > >> >> >> > RTE_EVENT_OP_RELEASE; > >> > > >> >> >> >- > >> >> >> >w->processed_pkts++; > >> > > >> >> >> >+ > >> > > >> >> >> >+ /* release > >> >> >> >+ barrier here ensures stored > >> > > >> >> >> >operation > >> > > >> >> >> >+ * of the > >> >> >> >+ event completes before the > >> > > >> >> >> >number of > >> > > >> >> >> >+ * processed > >> >> >> >+ pkts is visible to the main > >> > > >> >> >> >core > >> > > >> >> >> >+ */ > >> > > >> >> >> >+ > >> >> >> >+ __atomic_fetch_add(&(w- > >> > > >> >> >> >>processed_pkts), 1, > >> > > >> >> >> >+ > >> >__ATOMIC_RELEASE); > >> > > >> >> >> > continue; > >> > > >> >> >> > } > >> > > >> >> >> > > >> > > >> >> >> >@@ -254,7 +296,13 @@ > >> > > >> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg) > >> > > >> >> >> > > >> > > >> >> >> > rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0); > >> > > >> >> >> > > >> >> >> > pipeline_fwd_event(&ev[i], > >> > > >> >> >> > > >> > > >> >> >> > RTE_SCHED_TYPE_ATOMIC); > >> > > >> >> >> >- > >> >> >> >w->processed_pkts++; > >> > > >> >> >> >+ > >> > > >> >> >> >+ /* release > >> >> >> >+ barrier here ensures stored > >> > > >> >> >> >operation > >> > > >> >> >> >+ * of the > >> >> >> >+ event completes before the > >> > > >> >> >> >number of > >> > > >> >> >> >+ * processed > >> >> >> >+ pkts is visible to the main > >> > > >> >> >> >core > >> > > >> >> >> >+ */ > >> > > >> >> >> >+ > >> >> >> >+ __atomic_fetch_add(&(w- > >> > > >> >> >> >>processed_pkts), 1, > >> > > >> >> >> >+ > >> >__ATOMIC_RELEASE); > >> > > >> >> >> > } else { > >> > > >> >> >> > > >> >> >> > ev[i].queue_id++; > >> > > >> >> >> > > >> >> >> > pipeline_fwd_event(&ev[i], > >> > > >> >> >> >-- > >> > > >> >> >> >2.17.1 > >> >