On Wed, 11 Mar 2026 at 01:23, Samuel Wu <[email protected]> wrote: > > On Sat, Mar 7, 2026 at 12:17 PM Kumar Kartikeya Dwivedi > <[email protected]> wrote: > > > > On Sat, 7 Mar 2026 at 00:28, Kumar Kartikeya Dwivedi <[email protected]> > > wrote: > > [...] > > > > I think it's still unclear why this cannot be achieved by exploring > > > what Andrii proposed initially in [0], i.e. by using existing > > > primitives and adding extra kfuncs where necessary to solve > > > bottlenecks, and I didn't see any indication where it was explored > > > sufficiently. > > > > This is my bad, I misinterpreted/misunderstood Andrii's original > proposal to apply only to open-coded BPF iterators, hence the follow > up patches with them removed. However, I don't disagree this can be > achieved via open-coding. Both proposals are viable to traverse wakeup > sources via BPF, however I see two distinct advantages of the BPF > iterator: > > 1. Iterators provide a stable kernel interface that decouples the BPF > code from the kernel's internal data structures and refactors
I think this is my concern; committing to a stable interface that we'll have to maintain forever. It's a narrow use case that can be better served by dedicated kfuncs and rolling your own solution. There will be no end to the number of iterators if we keep making one for every linked list in the kernel. It would be different if the synchronization scheme for the wakeup_sources list was really complicated and hard to express inline in the verifier, but that is not the case. > 2. If wakeup sources are accessible via BPF, the next step should be > to deprecate the existing interfaces for wakeup sources, such as the > sysfs and debugfs stats. In that case, a text-based BPF iterator > output will provide less friction during the transition. > I would agree if this was a one-to-one conversion, where you just replace the directory for look up. But you'll be reading from one file vs several, so it's already a non-trivial change in the consumer. At that point, the conversion burden doesn't have the same weight in deciding whether to stick with textual representation. > > > I think all we need is the wakeup source read_lock()/unlock() APIs as > > > KF_ACQUIRE/KF_RELEASE kfuncs. Just need to ensure we pair each lock > > > with unlock by passing the index, which can be an opaque value > > > returned from the kfunc (e.g. dummy empty struct pointer encoding the > > > index). Then you can use bpf_for() to open-code the list iteration. > > > > I had a longer discussion with Paul on this. When having multiple open > > SRCU sections, mixing nested calls doesn't flatten the protection > > scopes and leads to all kinds of issues, an example of which is in > > [0]. That said, in this case, the verifier isn't tracking what the > > underlying SRCU section protects or isn't even aware of it, all it > > needs to ensure is that section is closed eventually, out-of-order > > unlocks are ok, it would just exist to allow looking at the list in a > > stable fashion using safe bpf_core_cast() accesses. > > > > So, I just wanted to clarify why I think this should be perfectly ok. > > > > [0]: > > https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/SRCUhoha.html > > > > (if you are like me, please follow the link at the bottom of [0] > > before feeling good about understanding the example). > > Appreciate the thoughtful feedback from you and Paul. The example was > nuanced but well-explained by the post, however, a key difference I > see is that the BPF iterator solution creates continuous locking > segments during traversal, as opposed to hand-over-hand locking in the > example provided. I could be missing something fundamental here- do > you mind elaborating? It was just an illustration to identify what guarantees the verifier would need to enforce. The most we need to do is just ensure SRCU indices are passed back during unlock, in whatever order. Critical section will be using bpf_core_cast() anyway to do safe walking of the list. With the KF_ACQUIRE/KF_RELEASE kfuncs you will still have a continuous locking segment. Just enclose your list walking loop between lock() and unlock(). > > > > > > The iterator would become unnecessary, and everything can be done from > > > a syscall program's BPF_PROG_RUN invocation. I didn't > > > check how we can get the pointer to the wakeup_sources LIST_HEAD, but > > > you could try several options (__ksym, > > > bpf_kallsyms_lookup_name+bpf_core_cast and then walk it, or return > > > pointer from a new kfunc and walk it using bpf_for()+bpf_core_cast()). > > > > > Getting the static variable is key to the open-coded approach, but a > clean way forward isn't obvious to me. Passing around raw addresses > seems brittle and can raise security concerns. The wrapper kfunc to I don't understand the security concern. All of this happens inside the BPF program. If you're writing these addresses to user space and leaking them, it's your bug. You can do that with iterators too. You have all the raw addresses available to you. > get the static address might be the best option, but it feels like it > violates the power subsystem's encapsulation. The iterator approach > abstracts all of this cleanly. > > > > Also, how much speed up does this really provide? It is hard to > > > understand the motivation from the current cover letter. Is the > > > bottleneck due to opening multiple files in sysfs? You will still have > > > to read text and parse text from the iterator's output in your current > > > solution. Wouldn't it be faster if you just used a syscall program, > > > and used binary output (using a ring buffer) to stream data to user > > > space? > > With the BPF iterator approach on a Pixel 6 in a quiescent state, I'm > seeing a 10x speedup when traversing wakeup sources; the speedup would > likely be even greater on a loaded system. > I'm sure you can replicate it with the approach described above. It might even be faster than parsing text from the file. > As alluded to earlier, the next step is to remove the > /sys/class/wakeup node and debugfs nodes. My apologies for not > explicitly listing this in the cover letter, but ultimately this would > save memory by removing those nodes, eliminate work by not creating > the nodes, and improve performance with a faster traversal, all while > transparently replacing legacy interfaces. And you can do that regardless of the type of solution chosen above. > > Thanks! > Sam > > [...]

