> On Dec 14, 2025, at 3:37 PM, Joel Fernandes <[email protected]> wrote:
> 
> 
> 
>>> On Dec 14, 2025, at 11:38 AM, Mathieu Desnoyers 
>>> <[email protected]> wrote:
>>> 
>>> On 2025-12-13 17:31, Paul E. McKenney wrote:
>>> Hello!
>>> I didn't do a good job of answering the "what about large numbers of
>>> hazard pointers" at Boqun's and my hazard-pointers talk at Linux Plumbers
>>> Conference yesterday, so please allow me to at least start on the path
>>> towards fixing that problem.
>>> Also, there were a couple of people participating whose email addresses
>>> I don't know, so please feel free to CC them.
>>> The trick is that in real workloads to date, although there might be
>>> millions of hazard pointers, there will typically only be a few active
>>> per CPU at a given time.  This of course suggests a per-CPU data structure
>>> tracking the active ones.  Allocating a hazard pointer grabs an unused one
>>> from this array, or, if all entries are in use, takes memory provided by
>>> the caller and links it into an overflow list.  Either way, it returns a
>>> pointer to the hazard pointer that is now visible to updaters.  When done,
>>> the caller calls a function that marks the array-entry as unused or
>>> removes the element from the list, as the case may be.  Because hazard
>>> pointers can migrate among CPUs, full synchronization is required when
>>> operating on the array and the overflow list.
>>> And either way, the caller is responsible for allocating and freeing the
>>> backup hazard-pointer structure that will be used in case of overflow.
>>> And also either way, the updater need only deal with hazard pointers
>>> that are currently in use.
>> OK, so let me state how I see the fundamental problem you are trying
>> to address, and detail a possible solution.
>> 
>> * Problem Statement
>> 
>> Assuming we go for an array of hazard pointer slots per CPU to cover
>> the fast path (common case), we still need to handle the
>> overflow scenario, where more hazard pointers are accessed
>> concurrently for a given CPU than the array size, either due to
>> preemption, nested interrupts, or simply nested calls.
>> 
>> * Possible Solution
>> 
>> Requiring the HP caller to allocate backup space is clearly something
>> that would cover all scenarios. My worry is that tracking this backup
>> space allocation may be cumbersome for the user, especially if this
>> requires heap allocation.
>> 
>> Where the backup space can be allocated will likely depend on how long
>> the HP will be accessed. My working hypothesis here (let me know if
>> I'm wrong) is that a most of those HP users will complete their access
>> within the same stack frame where the HP was acquired. This is the
>> primary use-case I would like to make sure is convenient.
>> 
>> For that use-case the users can simply allocate room on their
>> stack frame for the backup HP slot. The requirement here is that they
>> clear the HP slot before the end of the current stack frame.
>> If there is enough room in the per-CPU array, they use that, else
>> they add the backup slot from their stack into the backup slot
>> list. When they are done, if they used a backup slot, they need
>> to remove it from the list.
>> 
>> There could still be room for more advanced use-cases where the
>> backup slots are allocated on the heap, but I suspect that it would
>> make the API trickier to use and should be reserved for use-cases
>> that really require it.
>> 
>> Thoughts ?
> 
> This sounds fine to me.
> 
> We have had similar issues with kfree_rcu() and running out of preallocated 
> memory there meant we just triggered a slow path (synchronize) but for hazard 
> ptr I am not sure what such a slow path would be since, since this problem 
> appears to be on the reader side.
> 
> Perhaps we can also pre allocate overflow nodes in the api itself, and tap 
> into that in case of overflow? The user need not provide their own storage 
> then for overflow purposes, I think. And perhaps this pre allocated pool of 
> overflow nodes can also be common to all hazptr users. Ideally it would be 
> good to not all the api user deal with overflow at all and it transparently 
> works behind the scenes.
> 
> I think we need not worry too much about cases of preallocated overflow nodes 
> itself running out because that is no different than reserved memory needed 
> in atomic context which need a minimum off anyway right? And we have a 
> bounded number of CPU and bounded number of context, so the number of 
> *active* nodes required at any given time should also be bounded?
> 
> Thoughts?
> 
> Thanks.

By the way I wanted to emphasize my PoV, that requiring storage provided by the 
user seems to negate one of the big benefits of hazard pointers I think. Unless 
there is a solid use case for it, we should probably not require the user to 
provide separate storage IMHO (or make allocation internal to the api as I 
mentioned).

thanks.





> 
> 
> 
> 
> 
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
>> 

Reply via email to