On Thu, Jan 22, 2026 at 03:56 AM GMT, Jiayuan Chen wrote: > January 21, 2026 at 20:55, "Jiayuan Chen" <[email protected] > mailto:[email protected]?to=%22Jiayuan%20Chen%22%20%3Cjiayuan.chen%40linux.dev%3E >> wrote: >> January 21, 2026 at 17:36, "Jakub Sitnicki" <[email protected] >> mailto:[email protected]?to=%22Jakub%20Sitnicki%22%20%3Cjakub%40cloudflare.com%3E >> > I've been thinking about this some more and came to the conclusion that >> > this udp_bpf_ioctl implementation is actually what we want, while >> > tcp_bpf_ioctl *should not* be checking if the sk_receive_queue is >> > non-empty. >> > >> > Why? Because the verdict prog might redirect or drop the skbs from >> > sk_receive_queue once it actually runs. The messages might never appear >> > on the msg_ingress queue. >> > >> > What I think we should be doing, in the end, is kicking the >> > sk_receive_queue processing on bpf_map_update_elem, if there's data >> > ready. >> > >> > The API semantics I'm proposing is: >> > >> > 1. ioctl(FIONREAD) -> reports N bytes >> > 2. bpf_map_update_elem(sk) -> socket inserted into sockmap >> > 3. poll() for POLLIN -> wait for socket to be ready to read >> > 5. ioctl(FIONREAD) -> report N bytes if verdict prog didn't >> > redirect or drop it >> > >> > We don't have to add the the queue kick on map update in this series. >> > >> > If you decide to leave it for later, can I ask that you open an issue at >> > our GH project [1]? >> > >> > I don't want it to fall through the cracks. And I sometimes have people >> > asking what they could help with in sockmap. >> > >> > Thanks, >> > -jkbs >> > >> > [1] https://github.com/sockmap-project/sockmap-project/issues >> > >> Hi Jakub, >> >> Thanks for taking the time to think through this carefully. I agree with your >> analysis - reporting sk_receive_queue length is misleading since the verdict >> prog might redirect or drop those skbs. >> >> There's no rush to merge this patch. >> >> Since the kick queue on bpf_map_update_elem addresses a closely related >> issue, >> I think it makes sense to include it in this patchset for easier tracking >> rather >> than splitting it out. >> >> I'll spend more time looking into this and come back with an updated version. >> >> Thanks, >> Jiayuan >> > > > Hi Jakub, > > I've been thinking about this more, and I realize the problem is not as > simple as it seems. > > Regarding kicking the sk_receive_queue on bpf_map_update_elem: the BPF > program may not be fully initialized at that point. For example, with a > redirect program, the destination fd might not yet be inserted into the > map. If we kick the data through the BPF program immediately, the > redirect lookup would fail, leading to unexpected behavior (data being > dropped or passed to the wrong socket).
I reckon there is not much we can do about it because we have no control over when inserts/removes sockets from sockmap. It can happen at any time. Also, a newly received segment can trigger sk_data_ready callback, and that would also cause the skbs to get processed. We don't have control of that either. Does this change break any of our existing tests/benchmarks or some other setup of yours? > I also considered triggering the kick in poll/select via > sk_msg_is_readable(). However, this approach doesn't work for TCP > because tcp_poll() -> tcp_stream_is_readable() -> tcp_epollin_ready() > will return early when sk_receive_queue has data, before ever calling > sk_is_readable(). > > In the next version, I'll address your other nits and remove the > sk_receive_queue check from tcp_bpf_ioctl. I'll also open an issue on > the GH project to track this problem so we can continue exploring > better solutions. Sounds like a plan. Thanks!

