On Thu, Nov 7, 2019 at 10:35 PM David Marchand <david.march...@redhat.com> wrote: > > DPDK has multiple use cases where the core repeatedly polls a location in > memory. This polling results in many cache and memory transactions. > > Arm architecture provides WFE (Wait For Event) instruction, which allows > the cpu core to enter a low power state until woken up by the update to the > memory location being polled. Thus reducing the cache and memory > transactions. > > x86 has the PAUSE hint instruction to reduce such overhead. > > The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling > for a memory location to become equal to a given value'. > > For non-Arm platforms, these APIs are just wrappers around do-while loop > with rte_pause, so there are no performance differences. > > For Arm platforms, use of WFE can be configured using CONFIG_RTE_USE_WFE > option. It is disabled by default. > > Currently, use of WFE is supported only for aarch64 platforms. armv7 > platforms do support the WFE instruction, but they require explicit wake up > events(sev) and are less performannt. > > Testing shows that, performance varies across different platforms, with > some showing degradation. > > CONFIG_RTE_USE_WFE should be enabled depending on the performance on the > target platforms. > > V13: > - added release notes update, > - reworked arm implementation to avoid exporting inlines, > - added assert in generic implementation, > > V12: > - remove the 'rte_' prefix from the arm specific functions (David Marchand) > - use the __atomic_load_ex_xx functions in arm specific implementations of > APIS (David Marchand) > - remove the experimental warnings (David Marchand) > - tweak the macros working scope (David Marchand) > V11: > - add rte_ prefix to the __atomic_load_ex_x funtions (Ananyev Konstantin) > - define the above rte_atomic_load_ex_x funtions even if not > RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED for future non-wfe usages (Ananyev > Konstantin) > - use the above functions for arm specific rte_wait_until_equal_x functions > (Ananyev Konstantin) > - simplify the generic implementation by immersing "if" into "while" > (Ananyev Konstantin) > > V10: > - move arm specific stuff to arch/arm/rte_pause_64.h (Ananyev Konstantin) > > V9: > - fix a weblink broken (David Marchand) > - define rte_wfe and rte_sev() (Ananyev Konstantin) > - explicitly define three function APIs instead of marcos (Ananyev Konstantin) > - incorporate common rte_wfe and rte_sev into the generic rte_spinlock (David > Marchand) > - define arch neutral RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED (Ananyev Konstantin) > - define rte_load_ex_16/32/64 functions to use load-exclusive instruction for > aarch64, which is required for wake up of WFE > - drop the rte_spinlock patch from this series, as the it calls this > experimental API and it is widely included by a lot of components each > requires the ALLOW_EXPERIMENRAL_API for the Makefile and meson.build, leave > it to future after the experimental is removed. > > V8: > - simplify dmb definition to use io barriers (David Marchand) > - define wfe() and sev() macros and use them inside normal C code (Ananyev > Konstantin) > - pass memorder as parameter, not to incorporate it into function name, less > functions, similar to C11 atomic intrinsics (Ananyev Konstantin) > - remove mandating RTE_FORCE_INTRINSICS in arm spinlock implementation (David > Marchand) > - undef __WAIT_UNTIL_EQUAL after use (David Marchand) > - add experimental tag and warning (David Marchand) > - add the limitation of using WFE instruction in the commit log (David > Marchand) > - tweak the use of RTE_FORCE_INSTRINSICS (still mandatory for aarch64) and > RTE_ARM_USE_WFE for spinlock (David Marchand) > - drop the rte_ring patch from this series, as the rte_ring.h calls this API > and it is widely included by a lot of components each requires the > ALLOW_EXPERIMENRAL_API for the Makefile and meson.build, leave it to future > after the experimental is removed. > > V7: > - fix the checkpatch LONG_LINE_COMMENT issue > > V6: > - squash the RTE_ARM_USE_WFE configuration entry patch into the new API patch > - move the new configuration to the end of EAL > - add doxygen comments to reflect the relaxed and acquire semantics > - correct the meson configuration > > V5: > - add doxygen comments for the new APIs > - spinlock early exit without wfe if the spinlock not taken by others. > - add two patches on top for opdl and thunderx > > V4: > - rename the config as CONFIG_RTE_ARM_USE_WFE to indicate it applys to arm > only > - introduce a macro for assembly Skelton to reduce the duplication of code > - add one patch for nxp fslmc to address a compiling error > > V3: > - Convert RFCs to patches > > V2: > - Use inline functions instead of marcos > - Add load and compare in the beginning of the APIs > - Fix some style errors in asm inline > > V1: > - Add the new APIs and use it for ring and locks
Series applied. Thanks. -- David Marchand