Re: [dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins

David Marchand Thu, 16 Jul 2020 03:35:49 -0700

Hello,

On Thu, Jul 16, 2020 at 6:58 AM Phil Yang <[email protected]> wrote:
>
> Add information about possible optimizations using C11 atomic built-ins.


We are missing a review on this doc update.

Thanks.


-- 
David Marchand

>
> Signed-off-by: Phil Yang <[email protected]>
> Signed-off-by: Honnappa Nagarahalli <[email protected]>
> ---
>  doc/guides/prog_guide/writing_efficient_code.rst | 59 
> +++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/doc/guides/prog_guide/writing_efficient_code.rst 
> b/doc/guides/prog_guide/writing_efficient_code.rst
> index 849f63e..53a1ca1 100644
> --- a/doc/guides/prog_guide/writing_efficient_code.rst
> +++ b/doc/guides/prog_guide/writing_efficient_code.rst
> @@ -167,7 +167,13 @@ but with the added cost of lower throughput.
>  Locks and Atomic Operations
>  ---------------------------
>
> -Atomic operations imply a lock prefix before the instruction,
> +This section describes some key considerations when using locks and atomic
> +operations in the DPDK environment.
> +
> +Locks
> +~~~~~
> +
> +On x86, atomic operations imply a lock prefix before the instruction,
>  causing the processor's LOCK# signal to be asserted during execution of the 
> following instruction.
>  This has a big impact on performance in a multicore environment.
>
> @@ -176,6 +182,57 @@ It can often be replaced by other solutions like 
> per-lcore variables.
>  Also, some locking techniques are more efficient than others.
>  For instance, the Read-Copy-Update (RCU) algorithm can frequently replace 
> simple rwlocks.
>
> +Atomic Operations: Use C11 Atomic Built-ins
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +DPDK generic rte_atomic operations are implemented by __sync built-ins. These
> +__sync built-ins result in full barriers on aarch64, which are unnecessary
> +in many use cases. They can be replaced by __atomic built-ins that conform to
> +the C11 memory model and provide finer memory order control.
> +
> +So replacing the rte_atomic operations with __atomic built-ins might improve
> +performance for aarch64 machines.
> +
> +Some typical optimization cases are listed below:
> +
> +Atomicity
> +^^^^^^^^^
> +
> +Some use cases require atomicity alone, the ordering of the memory operations
> +does not matter. For example, the packet statistics counters need to be
> +incremented atomically but do not need any particular memory ordering.
> +So, RELAXED memory ordering is sufficient.
> +
> +One-way Barrier
> +^^^^^^^^^^^^^^^
> +
> +Some use cases allow for memory reordering in one way while requiring memory
> +ordering in the other direction.
> +
> +For example, the memory operations before the spinlock lock are allowed to
> +move to the critical section, but the memory operations in the critical 
> section
> +are not allowed to move above the lock. In this case, the full memory barrier
> +in the compare-and-swap operation can be replaced with ACQUIRE memory order.
> +On the other hand, the memory operations after the spinlock unlock are 
> allowed
> +to move to the critical section, but the memory operations in the critical
> +section are not allowed to move below the unlock. So the full barrier in the
> +store operation can use RELEASE memory order.
> +
> +Reader-Writer Concurrency
> +^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Lock-free reader-writer concurrency is one of the common use cases in DPDK.
> +
> +The payload or the data that the writer wants to communicate to the reader,
> +can be written with RELAXED memory order. However, the guard variable should
> +be written with RELEASE memory order. This ensures that the store to guard
> +variable is observable only after the store to payload is observable.
> +
> +Correspondingly, on the reader side, the guard variable should be read
> +with ACQUIRE memory order. The payload or the data the writer communicated,
> +can be read with RELAXED memory order. This ensures that, if the store to
> +guard variable is observable, the store to payload is also observable.
> +
>  Coding Considerations
>  ---------------------
>
> --
> 2.7.4
>

Re: [dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins

Reply via email to