On Thu, Apr 4, 2024 at 8:29 PM Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote: > > On Thu, Apr 4, 2024 at 10:56 AM Benjamin Tissoires > <benjamin.tissoi...@redhat.com> wrote: > > > > On Thu, Apr 4, 2024 at 6:41 PM Alexei Starovoitov > > <alexei.starovoi...@gmail.com> wrote: > > > > > > On Thu, Apr 4, 2024 at 8:27 AM Benjamin Tissoires > > > <benjamin.tissoi...@redhat.com> wrote: > > > > > > > > > > > > > > > > > > So we need something like: > > > > > > > > > > struct bpf_hrtimer { > > > > > union { > > > > > struct hrtimer timer; > > > > > + struct work_struct work; > > > > > }; > > > > > struct bpf_map *map; > > > > > struct bpf_prog *prog; > > > > > void __rcu *callback_fn; > > > > > void *value; > > > > > union { > > > > > > > > Are you sure we need an union here? If we get to call kfree_rcu() we > > > > need to have both struct rcu_head and sync_work, not one or the other. > > > > > > why? with an extra flag it's one or the other. > > > In bpf_timer_cancel_and_free() > > > if (flag & SLEEPABLE) { > > > schedule_work() to cancel_work_sync + kfree_rcu > > > } else { > > > hrtimer_cancel > > > kfree_rcu > > > } > > > > I thought kfree_rcu required struct rcu_head, and given that we need > > to initialize sync_work it will be poisoned... > > yes. It needs rcu_head. > But where do you see a conflict? > INIT_WORK + schedule_work() will use that space, > then cancel_work_sync() will wait on a different work_struct, > then kfree_rcu() will reuse that space.
Yeah, sorry, I haven't realized that the memory used by kfree_rcu wasn't initialized. > > In case of hrtimers none of the work_structs will be used. > > > > > > > > > > > struct rcu_head rcu; > > > > > + struct work_struct sync_work; > > > > > }; > > > > > + u64 flags; // bpf_timer_init() will require BPF_F_TIMER_SLEEPABLE > > > > > > > > If I understand, you want BPF_F_TIMER_SLEEPABLE in bpf_timer_init() > > > > (like in my v2 or v3 IIRC). But that means that once a timer is > > > > initialized it needs to be of one or the other type (especially true > > > > with the first union in this struct). > > > > > > yes. That's an idea. > > > The code to support wq vs timer seems to be diverging more > > > than what we expected initially. > > > It seems cleaner to set it as init time and enforce in > > > other helpers. > > > > OK, works for me. > > > > > > > > Also with two work_struct-s we're pushing the sizeof(bpf_hrtimer) > > > too far. > > > It's already at 112 bytes and some people use bpf_timer per flow. > > > So potentially millions of such timers. > > > Adding extra sizeof(struct work_struct)=32 * 2 that won't be > > > used is too much. > > > Note that sizeof(struct hrtimer)=64, so unions make everything > > > fit nicely. > > > > Maybe we should do > > union { > > struct hrtimer timer; > > struct { > > struct work_struct work; > > struct work_struct sync_work; > > } > > } > > It's also ok, but sharing rcu_head and work_struct seems > cleaner, since it highlights that they're exclusive. > > > (not nice to read but at least we don't change the size at the beginning) > > > > > > > > > So should we reject during run time bpf_timer_set_callback() for > > > > sleepable timers and only allow bpf_timer_set_sleepable_cb() for > > > > those? (and the invert in the other case). > > > > > > yes. > > > > > > > This version of the patch allows for one timer to be used as softIRQ > > > > or WQ, depending on the timer_set_callback that is used. But it might > > > > be simpler for the kfree_rcu race to define the bpf_timer to be of one > > > > kind, so we are sure to call the correct kfree method. > > > > > > I think one or another simplifies the code and makes it easier > > > to think through combinations. > > > > > > I'm still contemplating adding new "struct bpf_wq" and new kfuncs > > > to completely separate wq vs timer. > > > The code reuse seems to be relatively small. > > > > There is some code reuse in the verifier, but it can be factored out I > > think. > > > > Though the biggest reuse might be in the map portion of bpf_timer, > > which I haven't looked much TBH. > > Right. It's all the 'case BPF_TIMER:' in various places. > New 'struct bpf_wq' would need another entry in btf_field_type. > But that should be a straightforward addition. > > > > > > We can potentially factor out internals of bpf_timer_* into smaller > > > helpers and use them from bpf_timer_* and from new bpf_wq_* kfuncs. > > > > Yeah, also, given that we are going to enforce delay == 0 for > > sleepable timers (wq), the user api would be much cleaner if we can > > have a dedicated bpf_wq (and it would make the flags of bpf_timer_init > > easier to deal with). > > It seems so. > Kinda hard to judge one way or the other without looking at > the final code, but it seems separation is worth attempting, at least. > > Also if we ever do hrtimer+wq we probably will be using > 'struct delayed_work' instead of rolling our own > 'struct hrtimer' + 'struct work_struct' combo. > > It seems wq logic already made such a combination special enough > and thought through the races, so we better just follow that path. > In that case it might be yet another 'struct bpf_delayed_wq' > and another set of kfuncs. > Considering that cancel_work() and cancel_delayed_work() > are separate api in the kernel. > Multiplexing all of them under bpf_timer_cancel() > seems wrong. > In the past we were somewhat limited in terms of helpers. > We tried not to add them unless absolutely necessary because > of uapi considerations. > Now with kfuncs we can add/tweak/remove them at will. > > > > > > > > > One more thing. > > > bpf_timer_cancel() api turned out to be troublesome. > > > Not only it cancels the timer, but drops callback too. > > > It was a surprising behavior for people familiar with > > > kernel timer api-s. > > > We should not repeat this mistake with wq. > > > > > > We can try to fix bpf_timer_cancel() too. > > > If we drop drop_prog_refcnt() from it it shouldn't affect > > > existing bpf_timer users who are forced to do: > > > bpf_timer_cancel() > > > bpf_timer_set_callback() > > > bpf_timer_start() > > > all the time. > > > If/when bpf_timer_cancel() stops dropping the callback > > > such bpf prog won't be affected. So low chance of breaking any prog. > > > wdyt? > > > > > > > How would a program know set_callback() is not required after a > > cancel() because the kernel kept it around? It seems that it's going > > to be hard for them to know that (unless by trying first a start()), > > and it will add more code. > > > > timer_cancel() would be hard to change but we can always do the change > > and add a new kfunc timer_cancel_no_drop() which would clearly allow > > that works too. > > > for new programs to know that set_callback() is not required to be > > called. In a few kernel releases we could remove it and say that > > timer_cancel() is the same (and replaced by a #define) > > #define won't work, since mechanics of detecting and calling > helpers vs kfuncs is quite different. > > > Anyway, the more I think of it, the more I think the best API would be > > a dedicated wq API. It's probably going to need a little bit more > > work, but it'll be more or less this work plus the new bpf_wq type in > > the map. > > It seems to me as well. > > Thanks for brainstorming. > Alright, as of today (and I'm about to be AFK for the weekend), I got your changes in and working (I think). I'll review the series on Monday and send it back so we have a baseline to compare it to with bpf_wq. Cheers, Benjamin