Hi Feifei,

>Hi, Pavan
>
>Sorry for my late reply and thanks very much for your review.
>
>> -----Original Message-----
>> From: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com>
>> Sent: 2020年12月22日 18:33
>> To: Feifei Wang <feifei.wa...@arm.com>; jer...@marvell.com;
>Harry van
>> Haaren <harry.van.haa...@intel.com>; Pavan Nikhilesh
>> <pbhagavat...@caviumnetworks.com>
>> Cc: dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli
>> <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; Phil Yang
>> <phil.y...@arm.com>
>> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
>for
>> pipeline test
>>
>>
>> >Add release barriers before updating the processed packets for
>worker
>> >lcores to ensure the worker lcore has really finished data processing
>> >and then it can update the processed packets number.
>> >
>>
>> I believe we can live with minor inaccuracies in stats being presented
>as
>> atomics are pretty heavy when scheduler is limited to burst size as 1.
>>
>> One option is to move it before a pipeline operation
>(pipeline_event_tx,
>> pipeline_fwd_event etc.) as they imply implicit release barrier (as all
>the
>> changes done to the event should be visible to the next core).
>
>If I understand correctly, your meaning is that move release barriers
>before
>pipeline_event_tx or pipeline_fwd_event. This can ensure the event has
>been
>processed before the next core begins to tx/fwd. For example:

What I meant was event APIs such as `rte_event_enqueue_burst`, 
`rte_event_eth_tx_adapter_enqueue`
act as an implicit release barrier and the API `rte_event_dequeue_burst` act as 
an implicit acquire barrier.

Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I 
don’t believe we need barriers in 
Between.

>
>if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>               +       __atomic_thread_fence(__ATOMIC_RELEASE);
>                       pipeline_event_tx(dev, port, &ev);
>                       w->processed_pkts++;
>               } else {
>                       ev.queue_id++;
>               +       __atomic_thread_fence(__ATOMIC_RELEASE);
>                       pipeline_fwd_event(&ev,
>RTE_SCHED_TYPE_ATOMIC);
>                       pipeline_event_enqueue(dev, port, &ev);
>
>However, there are two reasons to prevent this:
>
>First, compare with other tests in app/eventdev, for example, the
>eventdev perf test,
>the wmb is after event operation to ensure operation has been finished
>and then w->processed_pkts++.

In case of perf_* tests start with a dequeue() and finally ends with a 
mempool_put()
should also act as implicit acquire release pairs making stats consistent?

>So, if we move release barriers before tx/fwd, it may cause that the
>tests of app/eventdev
>become  inconsistent.This may reduce the maintainability of the code
>and make it difficult to understand.
>
>Second, it is a test case, though heavy thread may cause performance
>degradation, it can ensure that
>the operation process and the test result are correct. And maybe for a
>test case, correctness is more important
>than performance.
>

Most of our internal perf test run on 24/48 core combinations and since 
Octeontx2 event device driver supports a burst size of 1, it will show up as
Huge performance degradation.

>So, due to two reasons above, I'm ambivalent about how we should do
>in the next step.
>
>Best Regards
>Feifei

Regards,
Pavan.

>
>> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
>> >functions")
>> >Cc: pbhagavat...@marvell.com
>> >Cc: sta...@dpdk.org
>> >
>> >Signed-off-by: Phil Yang <phil.y...@arm.com>
>> >Signed-off-by: Feifei Wang <feifei.wa...@arm.com>
>> >Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
>> >---
>> > app/test-eventdev/test_pipeline_queue.c | 64
>> >+++++++++++++++++++++----
>> > 1 file changed, 56 insertions(+), 8 deletions(-)
>> >
>> >diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-
>> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
>100644
>> >--- a/app/test-eventdev/test_pipeline_queue.c
>> >+++ b/app/test-eventdev/test_pipeline_queue.c
>> >@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void
>> >*arg)
>> >
>> >            if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>> >                    pipeline_event_tx(dev, port, &ev);
>> >-                   w->processed_pkts++;
>> >+
>> >+                   /* release barrier here ensures stored operation
>> >+                    * of the event completes before the number of
>> >+                    * processed pkts is visible to the main core
>> >+                    */
>> >+                   __atomic_fetch_add(&(w->processed_pkts), 1,
>> >+                                   __ATOMIC_RELEASE);
>> >            } else {
>> >                    ev.queue_id++;
>> >                    pipeline_fwd_event(&ev,
>> >RTE_SCHED_TYPE_ATOMIC);
>> >@@ -59,7 +65,13 @@
>pipeline_queue_worker_single_stage_fwd(void
>> >*arg)
>> >            rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>> >            pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
>> >            pipeline_event_enqueue(dev, port, &ev);
>> >-           w->processed_pkts++;
>> >+
>> >+           /* release barrier here ensures stored operation
>> >+            * of the event completes before the number of
>> >+            * processed pkts is visible to the main core
>> >+            */
>> >+           __atomic_fetch_add(&(w->processed_pkts), 1,
>> >+                           __ATOMIC_RELEASE);
>> >    }
>> >
>> >    return 0;
>> >@@ -84,7 +96,13 @@
>> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
>> >                    if (ev[i].sched_type ==
>> >RTE_SCHED_TYPE_ATOMIC) {
>> >                            pipeline_event_tx(dev, port, &ev[i]);
>> >                            ev[i].op = RTE_EVENT_OP_RELEASE;
>> >-                           w->processed_pkts++;
>> >+
>> >+                           /* release barrier here ensures stored
>> >operation
>> >+                            * of the event completes before the
>> >number of
>> >+                            * processed pkts is visible to the main
>> >core
>> >+                            */
>> >+                           __atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+                                           __ATOMIC_RELEASE);
>> >                    } else {
>> >                            ev[i].queue_id++;
>> >                            pipeline_fwd_event(&ev[i],
>> >@@ -121,7 +139,13 @@
>> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
>> >            }
>> >
>> >            pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
>> >-           w->processed_pkts += nb_rx;
>> >+
>> >+           /* release barrier here ensures stored operation
>> >+            * of the event completes before the number of
>> >+            * processed pkts is visible to the main core
>> >+            */
>> >+           __atomic_fetch_add(&(w->processed_pkts), nb_rx,
>> >+                           __ATOMIC_RELEASE);
>> >    }
>> >
>> >    return 0;
>> >@@ -146,7 +170,13 @@
>pipeline_queue_worker_multi_stage_tx(void
>> >*arg)
>> >
>> >            if (ev.queue_id == tx_queue[ev.mbuf->port]) {
>> >                    pipeline_event_tx(dev, port, &ev);
>> >-                   w->processed_pkts++;
>> >+
>> >+                   /* release barrier here ensures stored operation
>> >+                    * of the event completes before the number of
>> >+                    * processed pkts is visible to the main core
>> >+                    */
>> >+                   __atomic_fetch_add(&(w->processed_pkts), 1,
>> >+                                   __ATOMIC_RELEASE);
>> >                    continue;
>> >            }
>> >
>> >@@ -180,7 +210,13 @@
>> >pipeline_queue_worker_multi_stage_fwd(void *arg)
>> >                    ev.queue_id = tx_queue[ev.mbuf->port];
>> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>> >                    pipeline_fwd_event(&ev,
>> >RTE_SCHED_TYPE_ATOMIC);
>> >-                   w->processed_pkts++;
>> >+
>> >+                   /* release barrier here ensures stored operation
>> >+                    * of the event completes before the number of
>> >+                    * processed pkts is visible to the main core
>> >+                    */
>> >+                   __atomic_fetch_add(&(w->processed_pkts), 1,
>> >+                                   __ATOMIC_RELEASE);
>> >            } else {
>> >                    ev.queue_id++;
>> >                    pipeline_fwd_event(&ev,
>> >sched_type_list[cq_id]);
>> >@@ -214,7 +250,13 @@
>> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
>> >                    if (ev[i].queue_id == tx_queue[ev[i].mbuf-
>> >>port]) {
>> >                            pipeline_event_tx(dev, port, &ev[i]);
>> >                            ev[i].op = RTE_EVENT_OP_RELEASE;
>> >-                           w->processed_pkts++;
>> >+
>> >+                           /* release barrier here ensures stored
>> >operation
>> >+                            * of the event completes before the
>> >number of
>> >+                            * processed pkts is visible to the main
>> >core
>> >+                            */
>> >+                           __atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+                                           __ATOMIC_RELEASE);
>> >                            continue;
>> >                    }
>> >
>> >@@ -254,7 +296,13 @@
>> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
>> >
>> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
>> >                            pipeline_fwd_event(&ev[i],
>> >
>> >    RTE_SCHED_TYPE_ATOMIC);
>> >-                           w->processed_pkts++;
>> >+
>> >+                           /* release barrier here ensures stored
>> >operation
>> >+                            * of the event completes before the
>> >number of
>> >+                            * processed pkts is visible to the main
>> >core
>> >+                            */
>> >+                           __atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+                                           __ATOMIC_RELEASE);
>> >                    } else {
>> >                            ev[i].queue_id++;
>> >                            pipeline_fwd_event(&ev[i],
>> >--
>> >2.17.1

Reply via email to