> On Dec 14, 2025, at 3:37 PM, Joel Fernandes <[email protected]> wrote: > > > >>> On Dec 14, 2025, at 11:38 AM, Mathieu Desnoyers >>> <[email protected]> wrote: >>> >>> On 2025-12-13 17:31, Paul E. McKenney wrote: >>> Hello! >>> I didn't do a good job of answering the "what about large numbers of >>> hazard pointers" at Boqun's and my hazard-pointers talk at Linux Plumbers >>> Conference yesterday, so please allow me to at least start on the path >>> towards fixing that problem. >>> Also, there were a couple of people participating whose email addresses >>> I don't know, so please feel free to CC them. >>> The trick is that in real workloads to date, although there might be >>> millions of hazard pointers, there will typically only be a few active >>> per CPU at a given time. This of course suggests a per-CPU data structure >>> tracking the active ones. Allocating a hazard pointer grabs an unused one >>> from this array, or, if all entries are in use, takes memory provided by >>> the caller and links it into an overflow list. Either way, it returns a >>> pointer to the hazard pointer that is now visible to updaters. When done, >>> the caller calls a function that marks the array-entry as unused or >>> removes the element from the list, as the case may be. Because hazard >>> pointers can migrate among CPUs, full synchronization is required when >>> operating on the array and the overflow list. >>> And either way, the caller is responsible for allocating and freeing the >>> backup hazard-pointer structure that will be used in case of overflow. >>> And also either way, the updater need only deal with hazard pointers >>> that are currently in use. >> OK, so let me state how I see the fundamental problem you are trying >> to address, and detail a possible solution. >> >> * Problem Statement >> >> Assuming we go for an array of hazard pointer slots per CPU to cover >> the fast path (common case), we still need to handle the >> overflow scenario, where more hazard pointers are accessed >> concurrently for a given CPU than the array size, either due to >> preemption, nested interrupts, or simply nested calls. >> >> * Possible Solution >> >> Requiring the HP caller to allocate backup space is clearly something >> that would cover all scenarios. My worry is that tracking this backup >> space allocation may be cumbersome for the user, especially if this >> requires heap allocation. >> >> Where the backup space can be allocated will likely depend on how long >> the HP will be accessed. My working hypothesis here (let me know if >> I'm wrong) is that a most of those HP users will complete their access >> within the same stack frame where the HP was acquired. This is the >> primary use-case I would like to make sure is convenient. >> >> For that use-case the users can simply allocate room on their >> stack frame for the backup HP slot. The requirement here is that they >> clear the HP slot before the end of the current stack frame. >> If there is enough room in the per-CPU array, they use that, else >> they add the backup slot from their stack into the backup slot >> list. When they are done, if they used a backup slot, they need >> to remove it from the list. >> >> There could still be room for more advanced use-cases where the >> backup slots are allocated on the heap, but I suspect that it would >> make the API trickier to use and should be reserved for use-cases >> that really require it. >> >> Thoughts ? > > This sounds fine to me. > > We have had similar issues with kfree_rcu() and running out of preallocated > memory there meant we just triggered a slow path (synchronize) but for hazard > ptr I am not sure what such a slow path would be since, since this problem > appears to be on the reader side. > > Perhaps we can also pre allocate overflow nodes in the api itself, and tap > into that in case of overflow? The user need not provide their own storage > then for overflow purposes, I think. And perhaps this pre allocated pool of > overflow nodes can also be common to all hazptr users. Ideally it would be > good to not all the api user deal with overflow at all and it transparently > works behind the scenes. > > I think we need not worry too much about cases of preallocated overflow nodes > itself running out because that is no different than reserved memory needed > in atomic context which need a minimum off anyway right? And we have a > bounded number of CPU and bounded number of context, so the number of > *active* nodes required at any given time should also be bounded? > > Thoughts? > > Thanks.
By the way I wanted to emphasize my PoV, that requiring storage provided by the user seems to negate one of the big benefits of hazard pointers I think. Unless there is a solid use case for it, we should probably not require the user to provide separate storage IMHO (or make allocation internal to the api as I mentioned). thanks. > > > > > >> >> Thanks, >> >> Mathieu >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> https://www.efficios.com >>
