On 2024-09-21 23:07, Lai Jiangshan wrote:
+/*
+ * hpref_hp_get: Obtain a reference to a stable object, protected either
+ *               by hazard pointer (fast-path) or using reference
+ *               counter as fall-back.
+ */
+static inline
+bool hpref_hp_get(struct hpref_node **node_p, struct hpref_ctx *ctx)
+{
+       int cpu = rseq_current_cpu_raw();
+       struct hpref_percpu_slots *cpu_slots = 
rseq_percpu_ptr(hpref_percpu_slots, cpu);
+       struct hpref_slot *slot = &cpu_slots->slots[cpu_slots->current_slot];
+       bool use_refcount = false;
+       struct hpref_node *node, *node2;
+       unsigned int next_slot;
+
+retry:
+       node = uatomic_load(node_p, CMM_RELAXED);
+       if (!node)
+               return false;
+       /* Use rseq to try setting current slot hp. Store B. */
+       if (rseq_load_cbne_store__ptr(RSEQ_MO_RELAXED, RSEQ_PERCPU_CPU_ID,
+                               (intptr_t *) &slot->node, (intptr_t) NULL,
+                               (intptr_t) node, cpu)) {
+               slot = &cpu_slots->slots[HPREF_EMERGENCY_SLOT];

Can @cpu be possibly changed? if it can, it seems @cpu and @cpu_slots
should be updated first.

Indeed, if migration happens between rseq_current_cpu_raw() and
execution of rseq_load_cbne_store__ptr(), it will cause this
second operation to fail. This in turn could cause the reader
to retry for a long time (possibly forever) until it finally
migrates back to the original CPU. As you suggest, we should
re-load cpu and cpu_slots after failure here. More specifically,
we should re-load those when the rseq c.s. fails with -1, which
means it was abort or there was a cpu number mismatch. If the
rseq c.s. returns 1, this means the slot did not contain NULL,
so all we need to do is move over to the next slot.

While applying your suggested change, I noticed that I can
improve the fast-path by removing the notion of "current_slot"
number, and thus remove increment of this hint from the
fast path. The fast path will instead just scan all 8 slots
trying to find a NULL one. This also lessens the odds that
the fast-path must fallback to refcount in case of concurrent
use. It provides a 50% performance improvement for the fast
path with barrier/membarrier pairing.

I've updated the https://github.com/compudj/userspace-rcu-dev/tree/hpref
volatile feature branch with these changes. I'll give others some
time to provide feedback on the overall idea before sending out
an updated patch.

Thanks for your feedback!

Mathieu


+               use_refcount = true;
+               /*
+                * This may busy-wait for another reader using the
+                * emergency slot to transition to refcount.
+                */
+               caa_cpu_relax();
+               goto retry;
+       }

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


Reply via email to