On Mon, Feb 24, 2020 at 10:29 PM Stephen Hemminger
<step...@networkplumber.org> wrote:
>
> On Mon, 24 Feb 2020 11:35:09 +0000
> Konstantin Ananyev <konstantin.anan...@intel.com> wrote:
>
> > Upfront note - that RFC is not a complete patch.
> > It introduces an ABI breakage, plus it doesn't update ring_elem
> > code properly, etc.
> > I plan to deal with all these things in later versions.
> > Right now I seek an initial feedback about proposed ideas.
> > Would also ask people to repeat performance tests (see below)
> > on their platforms to confirm the impact.
> >
> > More and more customers use(/try to use) DPDK based apps within
> > overcommitted systems (multiple acttive threads over same pysical cores):
> > VM, container deployments, etc.
> > One quite common problem they hit: Lock-Holder-Preemption with rte_ring.
> > LHP is quite a common problem for spin-based sync primitives
> > (spin-locks, etc.) on overcommitted systems.
> > The situation gets much worse when some sort of
> > fair-locking technique is used (ticket-lock, etc.).
> > As now not only lock-owner but also lock-waiters scheduling
> > order matters a lot.
> > This is a well-known problem for kernel within VMs:
> > http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> > https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> > The problem with rte_ring is that while head accusion is sort of
> > un-fair locking, waiting on tail is very similar to ticket lock schema -
> > tail has to be updated in particular order.
> > That makes current rte_ring implementation to perform
> > really pure on some overcommited scenarios.
>
> Rather than reform rte_ring to fit this scenario, it would make
> more sense to me to introduce another primitive. The current lockless
> ring performs very well for the isolated thread model that DPDK
> was built around. This looks like a case of customers violating
> the usage model of the DPDK and then being surprised at the fallout.

I agree with Stephen here.

I think, adding more runtime check in the enqueue() and dequeue() will
have a bad effect on the low-end cores too.
But I agree with the problem statement that in the virtualization use
case, It may be possible to have N virtual cores runs on a physical
core.

IMO, The best solution would be keeping the ring API same and have a
different flavor in "compile-time". Something like
liburcu did for accommodating different flavors.

i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The
application can simply include ONE header file in a C file based on
the flavor.
If need both at runtime. Need to have function pointer or so in the
application and define the function in different c file by including
the approaite flavor in C file.

#include <urcu-qsbr.h> /* QSBR RCU flavor */
#include <urcu-bp.h> /* Bulletproof RCU flavor */













>

Reply via email to