On Sat, Mar 7, 2026 at 12:17 PM Kumar Kartikeya Dwivedi
<[email protected]> wrote:
>
> On Sat, 7 Mar 2026 at 00:28, Kumar Kartikeya Dwivedi <[email protected]> wrote:

[...]

> > I think it's still unclear why this cannot be achieved by exploring
> > what Andrii proposed initially in [0], i.e. by using existing
> > primitives and adding extra kfuncs where necessary to solve
> > bottlenecks, and I didn't see any indication where it was explored
> > sufficiently.
> >
This is my bad, I misinterpreted/misunderstood Andrii's original
proposal to apply only to open-coded BPF iterators, hence the follow
up patches with them removed. However, I don't disagree this can be
achieved via open-coding. Both proposals are viable to traverse wakeup
sources via BPF, however I see two distinct advantages of the BPF
iterator:

1. Iterators provide a stable kernel interface that decouples the BPF
code from the kernel's internal data structures and refactors
2. If wakeup sources are accessible via BPF, the next step should be
to deprecate the existing interfaces for wakeup sources, such as the
sysfs and debugfs stats. In that case, a text-based BPF iterator
output will provide less friction during the transition.

> > I think all we need is the wakeup source read_lock()/unlock() APIs as
> > KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock
> > with unlock by passing the index, which can be an opaque value
> > returned from the kfunc (e.g. dummy empty struct pointer encoding the
> > index). Then you can use bpf_for() to open-code the list iteration.
>
> I had a longer discussion with Paul on this. When having multiple open
> SRCU sections, mixing nested calls doesn't flatten the protection
> scopes and leads to all kinds of issues, an example of which is in
> [0]. That said, in this case, the verifier isn't tracking what the
> underlying SRCU section protects or isn't even aware of it, all it
> needs to ensure is that section is closed eventually, out-of-order
> unlocks are ok, it would just exist to allow looking at the list in a
> stable fashion using safe bpf_core_cast() accesses.
>
> So, I just wanted to clarify why I think this should be perfectly ok.
>
>   [0]: 
> https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/SRCUhoha.html
>
> (if you are like me, please follow the link at the bottom of [0]
> before feeling good about understanding the example).

Appreciate the thoughtful feedback from you and Paul. The example was
nuanced but well-explained by the post, however, a key difference I
see is that the BPF iterator solution creates continuous locking
segments during traversal, as opposed to hand-over-hand locking in the
example provided. I could be missing something fundamental here- do
you mind elaborating?

>
> > The iterator would become unnecessary, and everything can be done from
> > a syscall program's BPF_PROG_RUN invocation. I didn't
> > check how we can get the pointer to the wakeup_sources LIST_HEAD, but
> > you could try several options (__ksym,
> > bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return
> > pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()).
> >

Getting the static variable is key to the open-coded approach, but a
clean way forward isn't obvious to me. Passing around raw addresses
seems brittle and can raise security concerns. The wrapper kfunc to
get the static address might be the best option, but it feels like it
violates the power subsystem's encapsulation. The iterator approach
abstracts all of this cleanly.

> > Also, how much speed up does this really provide? It is hard to
> > understand the motivation from the current cover letter. Is the
> > bottleneck due to opening multiple files in sysfs? You will still have
> > to read text and parse text from the iterator's output in your current
> > solution. Wouldn't it be faster if you just used a syscall program,
> > and used binary output (using a ring buffer) to stream data to user
> > space?

With the BPF iterator approach on a Pixel 6 in a quiescent state, I'm
seeing a 10x speedup when traversing wakeup sources; the speedup would
likely be even greater on a loaded system.

As alluded to earlier, the next step is to remove the
/sys/class/wakeup node and debugfs nodes. My apologies for not
explicitly listing this in the cover letter, but ultimately this would
save memory by removing those nodes, eliminate work by not creating
the nodes, and improve performance with a faster traversal, all while
transparently replacing legacy interfaces.

Thanks!
Sam

[...]

Reply via email to