Hi Feifei, >Hi, Pavan > >Sorry for my late reply and thanks very much for your review. > >> -----Original Message----- >> From: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> >> Sent: 2020年12月22日 18:33 >> To: Feifei Wang <feifei.wa...@arm.com>; jer...@marvell.com; >Harry van >> Haaren <harry.van.haa...@intel.com>; Pavan Nikhilesh >> <pbhagavat...@caviumnetworks.com> >> Cc: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli >> <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; Phil Yang >> <phil.y...@arm.com> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers >for >> pipeline test >> >> >> >Add release barriers before updating the processed packets for >worker >> >lcores to ensure the worker lcore has really finished data processing >> >and then it can update the processed packets number. >> > >> >> I believe we can live with minor inaccuracies in stats being presented >as >> atomics are pretty heavy when scheduler is limited to burst size as 1. >> >> One option is to move it before a pipeline operation >(pipeline_event_tx, >> pipeline_fwd_event etc.) as they imply implicit release barrier (as all >the >> changes done to the event should be visible to the next core). > >If I understand correctly, your meaning is that move release barriers >before >pipeline_event_tx or pipeline_fwd_event. This can ensure the event has >been >processed before the next core begins to tx/fwd. For example:
What I meant was event APIs such as `rte_event_enqueue_burst`, `rte_event_eth_tx_adapter_enqueue` act as an implicit release barrier and the API `rte_event_dequeue_burst` act as an implicit acquire barrier. Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I don’t believe we need barriers in Between. > >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) { > + __atomic_thread_fence(__ATOMIC_RELEASE); > pipeline_event_tx(dev, port, &ev); > w->processed_pkts++; > } else { > ev.queue_id++; > + __atomic_thread_fence(__ATOMIC_RELEASE); > pipeline_fwd_event(&ev, >RTE_SCHED_TYPE_ATOMIC); > pipeline_event_enqueue(dev, port, &ev); > >However, there are two reasons to prevent this: > >First, compare with other tests in app/eventdev, for example, the >eventdev perf test, >the wmb is after event operation to ensure operation has been finished >and then w->processed_pkts++. In case of perf_* tests start with a dequeue() and finally ends with a mempool_put() should also act as implicit acquire release pairs making stats consistent? >So, if we move release barriers before tx/fwd, it may cause that the >tests of app/eventdev >become inconsistent.This may reduce the maintainability of the code >and make it difficult to understand. > >Second, it is a test case, though heavy thread may cause performance >degradation, it can ensure that >the operation process and the test result are correct. And maybe for a >test case, correctness is more important >than performance. > Most of our internal perf test run on 24/48 core combinations and since Octeontx2 event device driver supports a burst size of 1, it will show up as Huge performance degradation. >So, due to two reasons above, I'm ambivalent about how we should do >in the next step. > >Best Regards >Feifei Regards, Pavan. > >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker >> >functions") >> >Cc: pbhagavat...@marvell.com >> >Cc: sta...@dpdk.org >> > >> >Signed-off-by: Phil Yang <phil.y...@arm.com> >> >Signed-off-by: Feifei Wang <feifei.wa...@arm.com> >> >Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> >> >--- >> > app/test-eventdev/test_pipeline_queue.c | 64 >> >+++++++++++++++++++++---- >> > 1 file changed, 56 insertions(+), 8 deletions(-) >> > >> >diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test- >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb >100644 >> >--- a/app/test-eventdev/test_pipeline_queue.c >> >+++ b/app/test-eventdev/test_pipeline_queue.c >> >@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void >> >*arg) >> > >> > if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) { >> > pipeline_event_tx(dev, port, &ev); >> >- w->processed_pkts++; >> >+ >> >+ /* release barrier here ensures stored operation >> >+ * of the event completes before the number of >> >+ * processed pkts is visible to the main core >> >+ */ >> >+ __atomic_fetch_add(&(w->processed_pkts), 1, >> >+ __ATOMIC_RELEASE); >> > } else { >> > ev.queue_id++; >> > pipeline_fwd_event(&ev, >> >RTE_SCHED_TYPE_ATOMIC); >> >@@ -59,7 +65,13 @@ >pipeline_queue_worker_single_stage_fwd(void >> >*arg) >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0); >> > pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC); >> > pipeline_event_enqueue(dev, port, &ev); >> >- w->processed_pkts++; >> >+ >> >+ /* release barrier here ensures stored operation >> >+ * of the event completes before the number of >> >+ * processed pkts is visible to the main core >> >+ */ >> >+ __atomic_fetch_add(&(w->processed_pkts), 1, >> >+ __ATOMIC_RELEASE); >> > } >> > >> > return 0; >> >@@ -84,7 +96,13 @@ >> >pipeline_queue_worker_single_stage_burst_tx(void *arg) >> > if (ev[i].sched_type == >> >RTE_SCHED_TYPE_ATOMIC) { >> > pipeline_event_tx(dev, port, &ev[i]); >> > ev[i].op = RTE_EVENT_OP_RELEASE; >> >- w->processed_pkts++; >> >+ >> >+ /* release barrier here ensures stored >> >operation >> >+ * of the event completes before the >> >number of >> >+ * processed pkts is visible to the main >> >core >> >+ */ >> >+ __atomic_fetch_add(&(w- >> >>processed_pkts), 1, >> >+ __ATOMIC_RELEASE); >> > } else { >> > ev[i].queue_id++; >> > pipeline_fwd_event(&ev[i], >> >@@ -121,7 +139,13 @@ >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg) >> > } >> > >> > pipeline_event_enqueue_burst(dev, port, ev, nb_rx); >> >- w->processed_pkts += nb_rx; >> >+ >> >+ /* release barrier here ensures stored operation >> >+ * of the event completes before the number of >> >+ * processed pkts is visible to the main core >> >+ */ >> >+ __atomic_fetch_add(&(w->processed_pkts), nb_rx, >> >+ __ATOMIC_RELEASE); >> > } >> > >> > return 0; >> >@@ -146,7 +170,13 @@ >pipeline_queue_worker_multi_stage_tx(void >> >*arg) >> > >> > if (ev.queue_id == tx_queue[ev.mbuf->port]) { >> > pipeline_event_tx(dev, port, &ev); >> >- w->processed_pkts++; >> >+ >> >+ /* release barrier here ensures stored operation >> >+ * of the event completes before the number of >> >+ * processed pkts is visible to the main core >> >+ */ >> >+ __atomic_fetch_add(&(w->processed_pkts), 1, >> >+ __ATOMIC_RELEASE); >> > continue; >> > } >> > >> >@@ -180,7 +210,13 @@ >> >pipeline_queue_worker_multi_stage_fwd(void *arg) >> > ev.queue_id = tx_queue[ev.mbuf->port]; >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0); >> > pipeline_fwd_event(&ev, >> >RTE_SCHED_TYPE_ATOMIC); >> >- w->processed_pkts++; >> >+ >> >+ /* release barrier here ensures stored operation >> >+ * of the event completes before the number of >> >+ * processed pkts is visible to the main core >> >+ */ >> >+ __atomic_fetch_add(&(w->processed_pkts), 1, >> >+ __ATOMIC_RELEASE); >> > } else { >> > ev.queue_id++; >> > pipeline_fwd_event(&ev, >> >sched_type_list[cq_id]); >> >@@ -214,7 +250,13 @@ >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg) >> > if (ev[i].queue_id == tx_queue[ev[i].mbuf- >> >>port]) { >> > pipeline_event_tx(dev, port, &ev[i]); >> > ev[i].op = RTE_EVENT_OP_RELEASE; >> >- w->processed_pkts++; >> >+ >> >+ /* release barrier here ensures stored >> >operation >> >+ * of the event completes before the >> >number of >> >+ * processed pkts is visible to the main >> >core >> >+ */ >> >+ __atomic_fetch_add(&(w- >> >>processed_pkts), 1, >> >+ __ATOMIC_RELEASE); >> > continue; >> > } >> > >> >@@ -254,7 +296,13 @@ >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg) >> > >> > rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0); >> > pipeline_fwd_event(&ev[i], >> > >> > RTE_SCHED_TYPE_ATOMIC); >> >- w->processed_pkts++; >> >+ >> >+ /* release barrier here ensures stored >> >operation >> >+ * of the event completes before the >> >number of >> >+ * processed pkts is visible to the main >> >core >> >+ */ >> >+ __atomic_fetch_add(&(w- >> >>processed_pkts), 1, >> >+ __ATOMIC_RELEASE); >> > } else { >> > ev[i].queue_id++; >> > pipeline_fwd_event(&ev[i], >> >-- >> >2.17.1