Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
Daniel Borkmann writes: > On 4/16/21 12:22 AM, Andrii Nakryiko wrote: >> On Thu, Apr 15, 2021 at 3:10 PM Daniel Borkmann wrote: >>> On 4/15/21 1:58 AM, Andrii Nakryiko wrote: On Wed, Apr 14, 2021 at 4:32 PM Daniel Borkmann wrote: > On 4/15/21 1:19 AM, Andrii Nakryiko wrote: >> On Wed, Apr 14, 2021 at 3:51 PM Toke Høiland-Jørgensen >> wrote: >>> Andrii Nakryiko writes: On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen wrote: > Andrii Nakryiko writes: >> On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen >> wrote: >>> Andrii Nakryiko writes: On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov wrote: > On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi > wrote: >> On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: >>> On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi >>> wrote: [...] >>> >>> All of these things are messy because of tc legacy. bpf tried >>> to follow tc style >>> with cls and act distinction and it didn't quite work. cls with >>> direct-action is the only >>> thing that became mainstream while tc style attach wasn't >>> really addressed. >>> There were several incidents where tc had tens of thousands of >>> progs attached >>> because of this attach/query/index weirdness described above. >>> I think the only way to address this properly is to introduce >>> bpf_link style of >>> attaching to tc. Such bpf_link would support ingress/egress >>> only. >>> direction-action will be implied. There won't be any index and >>> query >>> will be obvious. >> >> Note that we already have bpf_link support working (without >> support for pinning >> ofcourse) in a limited way. The ifindex, protocol, parent_id, >> priority, handle, >> chain_index tuple uniquely identifies a filter, so we stash this >> in the bpf_link >> and are able to operate on the exact filter during release. > > Except they're not unique. The library can stash them, but > something else > doing detach via iproute2 or their own netlink calls will detach > the prog. > This other app can attach to the same spot a different prog and > now > bpf_link__destroy will be detaching somebody else prog. > >>> So I would like to propose to take this patch set a step >>> further from >>> what Daniel said: >>> int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): >>> and make this proposed api to return FD. >>> To detach from tc ingress/egress just close(fd). >> >> You mean adding an fd-based TC API to the kernel? > > yes. I'm totally for bpf_link-based TC attachment. But I think *also* having "legacy" netlink-based APIs will allow applications to handle older kernels in a much nicer way without extra dependency on iproute2. We have a similar situation with kprobe, where currently libbpf only supports "modern" fd-based attachment, but users periodically ask questions and struggle to figure out issues on older kernels that don't support new APIs. >>> >>> +1; I am OK with adding a new bpf_link-based way to attach TC >>> programs, >>> but we still need to support the netlink API in libbpf. >>> So I think we'd have to support legacy TC APIs, but I agree with Alexei and Daniel that we should keep it to the simplest and most straightforward API of supporting direction-action attachments and setting up qdisc transparently (if I'm getting all the terminology right, after reading Quentin's blog post). That coincidentally should probably match how bpf_link-based TC API will look like, so all that can be abstracted behind a single bpf_link__attach_tc() API as well, right? That's the plan for dealing with kprobe right now, btw. Libbpf will detect the best available API and transparently fall back (maybe with some warning for awareness, due to inherent downsides of legacy APIs: no auto-cleanup being the most prominent one). >>> >>> Yup, SGTM: Expose both
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On 4/16/21 12:22 AM, Andrii Nakryiko wrote: On Thu, Apr 15, 2021 at 3:10 PM Daniel Borkmann wrote: On 4/15/21 1:58 AM, Andrii Nakryiko wrote: On Wed, Apr 14, 2021 at 4:32 PM Daniel Borkmann wrote: On 4/15/21 1:19 AM, Andrii Nakryiko wrote: On Wed, Apr 14, 2021 at 3:51 PM Toke Høiland-Jørgensen wrote: Andrii Nakryiko writes: On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen wrote: Andrii Nakryiko writes: On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen wrote: Andrii Nakryiko writes: On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov wrote: On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi wrote: On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi wrote: [...] All of these things are messy because of tc legacy. bpf tried to follow tc style with cls and act distinction and it didn't quite work. cls with direct-action is the only thing that became mainstream while tc style attach wasn't really addressed. There were several incidents where tc had tens of thousands of progs attached because of this attach/query/index weirdness described above. I think the only way to address this properly is to introduce bpf_link style of attaching to tc. Such bpf_link would support ingress/egress only. direction-action will be implied. There won't be any index and query will be obvious. Note that we already have bpf_link support working (without support for pinning ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, handle, chain_index tuple uniquely identifies a filter, so we stash this in the bpf_link and are able to operate on the exact filter during release. Except they're not unique. The library can stash them, but something else doing detach via iproute2 or their own netlink calls will detach the prog. This other app can attach to the same spot a different prog and now bpf_link__destroy will be detaching somebody else prog. So I would like to propose to take this patch set a step further from what Daniel said: int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): and make this proposed api to return FD. To detach from tc ingress/egress just close(fd). You mean adding an fd-based TC API to the kernel? yes. I'm totally for bpf_link-based TC attachment. But I think *also* having "legacy" netlink-based APIs will allow applications to handle older kernels in a much nicer way without extra dependency on iproute2. We have a similar situation with kprobe, where currently libbpf only supports "modern" fd-based attachment, but users periodically ask questions and struggle to figure out issues on older kernels that don't support new APIs. +1; I am OK with adding a new bpf_link-based way to attach TC programs, but we still need to support the netlink API in libbpf. So I think we'd have to support legacy TC APIs, but I agree with Alexei and Daniel that we should keep it to the simplest and most straightforward API of supporting direction-action attachments and setting up qdisc transparently (if I'm getting all the terminology right, after reading Quentin's blog post). That coincidentally should probably match how bpf_link-based TC API will look like, so all that can be abstracted behind a single bpf_link__attach_tc() API as well, right? That's the plan for dealing with kprobe right now, btw. Libbpf will detect the best available API and transparently fall back (maybe with some warning for awareness, due to inherent downsides of legacy APIs: no auto-cleanup being the most prominent one). Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the high-level API auto-detect. That way users can also still use the netlink attach function if they don't want the fd-based auto-close behaviour of bpf_link. So I thought a bit more about this, and it feels like the right move would be to expose only higher-level TC BPF API behind bpf_link. It will keep the API complexity and amount of APIs that libbpf will have to support to the minimum, and will keep the API itself simple: direct-attach with the minimum amount of input arguments. By not exposing low-level APIs we also table the whole bpf_tc_cls_attach_id design discussion, as we now can keep as much info as needed inside bpf_link_tc (which will embed bpf_link internally as well) to support detachment and possibly some additional querying, if needed. But then there would be no way for the caller to explicitly select a mechanism? I.e., if I write a BPF program using this mechanism targeting a 5.12 kernel, I'll get netlink attachment, which can stick around when I do bpf_link__disconnect(). But then if the kernel gets upgraded to support bpf_link for TC programs I'll suddenly transparently get bpf_link and the attachments will go away unless I pin them. This seems... less than ideal? That's what we are doing with bpf_program__attach_kprobe(), though. And so far I've only seen people (privately) saying how good it would be
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Thu, Apr 15, 2021 at 3:10 PM Daniel Borkmann wrote: > > On 4/15/21 1:58 AM, Andrii Nakryiko wrote: > > On Wed, Apr 14, 2021 at 4:32 PM Daniel Borkmann > > wrote: > >> On 4/15/21 1:19 AM, Andrii Nakryiko wrote: > >>> On Wed, Apr 14, 2021 at 3:51 PM Toke Høiland-Jørgensen > >>> wrote: > Andrii Nakryiko writes: > > On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen > > wrote: > >> Andrii Nakryiko writes: > >>> On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen > >>> wrote: > Andrii Nakryiko writes: > > On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov > > wrote: > >> On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi > >> wrote: > >>> On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi > wrote: > > [...] > > All of these things are messy because of tc legacy. bpf tried to > follow tc style > with cls and act distinction and it didn't quite work. cls with > direct-action is the only > thing that became mainstream while tc style attach wasn't really > addressed. > There were several incidents where tc had tens of thousands of > progs attached > because of this attach/query/index weirdness described above. > I think the only way to address this properly is to introduce > bpf_link style of > attaching to tc. Such bpf_link would support ingress/egress only. > direction-action will be implied. There won't be any index and > query > will be obvious. > >>> > >>> Note that we already have bpf_link support working (without > >>> support for pinning > >>> ofcourse) in a limited way. The ifindex, protocol, parent_id, > >>> priority, handle, > >>> chain_index tuple uniquely identifies a filter, so we stash this > >>> in the bpf_link > >>> and are able to operate on the exact filter during release. > >> > >> Except they're not unique. The library can stash them, but > >> something else > >> doing detach via iproute2 or their own netlink calls will detach > >> the prog. > >> This other app can attach to the same spot a different prog and now > >> bpf_link__destroy will be detaching somebody else prog. > >> > So I would like to propose to take this patch set a step further > from > what Daniel said: > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > and make this proposed api to return FD. > To detach from tc ingress/egress just close(fd). > >>> > >>> You mean adding an fd-based TC API to the kernel? > >> > >> yes. > > > > I'm totally for bpf_link-based TC attachment. > > > > But I think *also* having "legacy" netlink-based APIs will allow > > applications to handle older kernels in a much nicer way without > > extra > > dependency on iproute2. We have a similar situation with kprobe, > > where > > currently libbpf only supports "modern" fd-based attachment, but > > users > > periodically ask questions and struggle to figure out issues on > > older > > kernels that don't support new APIs. > > +1; I am OK with adding a new bpf_link-based way to attach TC > programs, > but we still need to support the netlink API in libbpf. > > > So I think we'd have to support legacy TC APIs, but I agree with > > Alexei and Daniel that we should keep it to the simplest and most > > straightforward API of supporting direction-action attachments and > > setting up qdisc transparently (if I'm getting all the terminology > > right, after reading Quentin's blog post). That coincidentally > > should > > probably match how bpf_link-based TC API will look like, so all that > > can be abstracted behind a single bpf_link__attach_tc() API as well, > > right? That's the plan for dealing with kprobe right now, btw. > > Libbpf > > will detect the best available API and transparently fall back > > (maybe > > with some warning for awareness, due to inherent downsides of legacy > > APIs: no auto-cleanup being the most prominent one). > > Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the > high-level API auto-detect. That way users can also still use the > netlink attach function if they don't want the fd-based auto-close > behaviour of bpf_link. > >>>
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On 4/15/21 1:58 AM, Andrii Nakryiko wrote: On Wed, Apr 14, 2021 at 4:32 PM Daniel Borkmann wrote: On 4/15/21 1:19 AM, Andrii Nakryiko wrote: On Wed, Apr 14, 2021 at 3:51 PM Toke Høiland-Jørgensen wrote: Andrii Nakryiko writes: On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen wrote: Andrii Nakryiko writes: On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen wrote: Andrii Nakryiko writes: On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov wrote: On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi wrote: On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi wrote: [...] All of these things are messy because of tc legacy. bpf tried to follow tc style with cls and act distinction and it didn't quite work. cls with direct-action is the only thing that became mainstream while tc style attach wasn't really addressed. There were several incidents where tc had tens of thousands of progs attached because of this attach/query/index weirdness described above. I think the only way to address this properly is to introduce bpf_link style of attaching to tc. Such bpf_link would support ingress/egress only. direction-action will be implied. There won't be any index and query will be obvious. Note that we already have bpf_link support working (without support for pinning ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, handle, chain_index tuple uniquely identifies a filter, so we stash this in the bpf_link and are able to operate on the exact filter during release. Except they're not unique. The library can stash them, but something else doing detach via iproute2 or their own netlink calls will detach the prog. This other app can attach to the same spot a different prog and now bpf_link__destroy will be detaching somebody else prog. So I would like to propose to take this patch set a step further from what Daniel said: int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): and make this proposed api to return FD. To detach from tc ingress/egress just close(fd). You mean adding an fd-based TC API to the kernel? yes. I'm totally for bpf_link-based TC attachment. But I think *also* having "legacy" netlink-based APIs will allow applications to handle older kernels in a much nicer way without extra dependency on iproute2. We have a similar situation with kprobe, where currently libbpf only supports "modern" fd-based attachment, but users periodically ask questions and struggle to figure out issues on older kernels that don't support new APIs. +1; I am OK with adding a new bpf_link-based way to attach TC programs, but we still need to support the netlink API in libbpf. So I think we'd have to support legacy TC APIs, but I agree with Alexei and Daniel that we should keep it to the simplest and most straightforward API of supporting direction-action attachments and setting up qdisc transparently (if I'm getting all the terminology right, after reading Quentin's blog post). That coincidentally should probably match how bpf_link-based TC API will look like, so all that can be abstracted behind a single bpf_link__attach_tc() API as well, right? That's the plan for dealing with kprobe right now, btw. Libbpf will detect the best available API and transparently fall back (maybe with some warning for awareness, due to inherent downsides of legacy APIs: no auto-cleanup being the most prominent one). Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the high-level API auto-detect. That way users can also still use the netlink attach function if they don't want the fd-based auto-close behaviour of bpf_link. So I thought a bit more about this, and it feels like the right move would be to expose only higher-level TC BPF API behind bpf_link. It will keep the API complexity and amount of APIs that libbpf will have to support to the minimum, and will keep the API itself simple: direct-attach with the minimum amount of input arguments. By not exposing low-level APIs we also table the whole bpf_tc_cls_attach_id design discussion, as we now can keep as much info as needed inside bpf_link_tc (which will embed bpf_link internally as well) to support detachment and possibly some additional querying, if needed. But then there would be no way for the caller to explicitly select a mechanism? I.e., if I write a BPF program using this mechanism targeting a 5.12 kernel, I'll get netlink attachment, which can stick around when I do bpf_link__disconnect(). But then if the kernel gets upgraded to support bpf_link for TC programs I'll suddenly transparently get bpf_link and the attachments will go away unless I pin them. This seems... less than ideal? That's what we are doing with bpf_program__attach_kprobe(), though. And so far I've only seen people (privately) saying how good it would be to have bpf_link-based TC APIs, doesn't seem like anyone with a realistic use case prefers the curren
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Thu, Apr 15, 2021 at 8:57 AM Toke Høiland-Jørgensen wrote: > > Andrii Nakryiko writes: > > > On Wed, Apr 14, 2021 at 3:51 PM Toke Høiland-Jørgensen > > wrote: > >> > >> Andrii Nakryiko writes: > >> > >> > On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen > >> > wrote: > >> >> > >> >> Andrii Nakryiko writes: > >> >> > >> >> > On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen > >> >> > wrote: > >> >> >> > >> >> >> Andrii Nakryiko writes: > >> >> >> > >> >> >> > On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi > >> >> >> >> wrote: > >> >> >> >> > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov > >> >> >> >> > wrote: > >> >> >> >> > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi > >> >> >> >> > > wrote: > >> >> >> >> > > > [...] > >> >> >> >> > > > >> >> >> >> > > All of these things are messy because of tc legacy. bpf tried > >> >> >> >> > > to follow tc style > >> >> >> >> > > with cls and act distinction and it didn't quite work. cls > >> >> >> >> > > with > >> >> >> >> > > direct-action is the only > >> >> >> >> > > thing that became mainstream while tc style attach wasn't > >> >> >> >> > > really addressed. > >> >> >> >> > > There were several incidents where tc had tens of thousands > >> >> >> >> > > of progs attached > >> >> >> >> > > because of this attach/query/index weirdness described above. > >> >> >> >> > > I think the only way to address this properly is to introduce > >> >> >> >> > > bpf_link style of > >> >> >> >> > > attaching to tc. Such bpf_link would support ingress/egress > >> >> >> >> > > only. > >> >> >> >> > > direction-action will be implied. There won't be any index > >> >> >> >> > > and query > >> >> >> >> > > will be obvious. > >> >> >> >> > > >> >> >> >> > Note that we already have bpf_link support working (without > >> >> >> >> > support for pinning > >> >> >> >> > ofcourse) in a limited way. The ifindex, protocol, parent_id, > >> >> >> >> > priority, handle, > >> >> >> >> > chain_index tuple uniquely identifies a filter, so we stash > >> >> >> >> > this in the bpf_link > >> >> >> >> > and are able to operate on the exact filter during release. > >> >> >> >> > >> >> >> >> Except they're not unique. The library can stash them, but > >> >> >> >> something else > >> >> >> >> doing detach via iproute2 or their own netlink calls will detach > >> >> >> >> the prog. > >> >> >> >> This other app can attach to the same spot a different prog and > >> >> >> >> now > >> >> >> >> bpf_link__destroy will be detaching somebody else prog. > >> >> >> >> > >> >> >> >> > > So I would like to propose to take this patch set a step > >> >> >> >> > > further from > >> >> >> >> > > what Daniel said: > >> >> >> >> > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > >> >> >> >> > > and make this proposed api to return FD. > >> >> >> >> > > To detach from tc ingress/egress just close(fd). > >> >> >> >> > > >> >> >> >> > You mean adding an fd-based TC API to the kernel? > >> >> >> >> > >> >> >> >> yes. > >> >> >> > > >> >> >> > I'm totally for bpf_link-based TC attachment. > >> >> >> > > >> >> >> > But I think *also* having "legacy" netlink-based APIs will allow > >> >> >> > applications to handle older kernels in a much nicer way without > >> >> >> > extra > >> >> >> > dependency on iproute2. We have a similar situation with kprobe, > >> >> >> > where > >> >> >> > currently libbpf only supports "modern" fd-based attachment, but > >> >> >> > users > >> >> >> > periodically ask questions and struggle to figure out issues on > >> >> >> > older > >> >> >> > kernels that don't support new APIs. > >> >> >> > >> >> >> +1; I am OK with adding a new bpf_link-based way to attach TC > >> >> >> programs, > >> >> >> but we still need to support the netlink API in libbpf. > >> >> >> > >> >> >> > So I think we'd have to support legacy TC APIs, but I agree with > >> >> >> > Alexei and Daniel that we should keep it to the simplest and most > >> >> >> > straightforward API of supporting direction-action attachments and > >> >> >> > setting up qdisc transparently (if I'm getting all the terminology > >> >> >> > right, after reading Quentin's blog post). That coincidentally > >> >> >> > should > >> >> >> > probably match how bpf_link-based TC API will look like, so all > >> >> >> > that > >> >> >> > can be abstracted behind a single bpf_link__attach_tc() API as > >> >> >> > well, > >> >> >> > right? That's the plan for dealing with kprobe right now, btw. > >> >> >> > Libbpf > >> >> >> > will detect the best available API and transparently fall back > >> >> >> > (maybe > >> >> >> > with some warning for awareness, due to inherent downsides of > >> >> >> > legacy > >> >> >> > APIs: no auto-cleanup being the most prominent one). > >> >> >> > >> >> >> Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the > >> >> >> high-l
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
Andrii Nakryiko writes: > On Wed, Apr 14, 2021 at 3:51 PM Toke Høiland-Jørgensen > wrote: >> >> Andrii Nakryiko writes: >> >> > On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen >> > wrote: >> >> >> >> Andrii Nakryiko writes: >> >> >> >> > On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen >> >> > wrote: >> >> >> >> >> >> Andrii Nakryiko writes: >> >> >> >> >> >> > On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov >> >> >> > wrote: >> >> >> >> >> >> >> >> On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi >> >> >> >> wrote: >> >> >> >> > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: >> >> >> >> > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi >> >> >> >> > > wrote: >> >> >> >> > > > [...] >> >> >> >> > > >> >> >> >> > > All of these things are messy because of tc legacy. bpf tried >> >> >> >> > > to follow tc style >> >> >> >> > > with cls and act distinction and it didn't quite work. cls with >> >> >> >> > > direct-action is the only >> >> >> >> > > thing that became mainstream while tc style attach wasn't >> >> >> >> > > really addressed. >> >> >> >> > > There were several incidents where tc had tens of thousands of >> >> >> >> > > progs attached >> >> >> >> > > because of this attach/query/index weirdness described above. >> >> >> >> > > I think the only way to address this properly is to introduce >> >> >> >> > > bpf_link style of >> >> >> >> > > attaching to tc. Such bpf_link would support ingress/egress >> >> >> >> > > only. >> >> >> >> > > direction-action will be implied. There won't be any index and >> >> >> >> > > query >> >> >> >> > > will be obvious. >> >> >> >> > >> >> >> >> > Note that we already have bpf_link support working (without >> >> >> >> > support for pinning >> >> >> >> > ofcourse) in a limited way. The ifindex, protocol, parent_id, >> >> >> >> > priority, handle, >> >> >> >> > chain_index tuple uniquely identifies a filter, so we stash this >> >> >> >> > in the bpf_link >> >> >> >> > and are able to operate on the exact filter during release. >> >> >> >> >> >> >> >> Except they're not unique. The library can stash them, but >> >> >> >> something else >> >> >> >> doing detach via iproute2 or their own netlink calls will detach >> >> >> >> the prog. >> >> >> >> This other app can attach to the same spot a different prog and now >> >> >> >> bpf_link__destroy will be detaching somebody else prog. >> >> >> >> >> >> >> >> > > So I would like to propose to take this patch set a step >> >> >> >> > > further from >> >> >> >> > > what Daniel said: >> >> >> >> > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): >> >> >> >> > > and make this proposed api to return FD. >> >> >> >> > > To detach from tc ingress/egress just close(fd). >> >> >> >> > >> >> >> >> > You mean adding an fd-based TC API to the kernel? >> >> >> >> >> >> >> >> yes. >> >> >> > >> >> >> > I'm totally for bpf_link-based TC attachment. >> >> >> > >> >> >> > But I think *also* having "legacy" netlink-based APIs will allow >> >> >> > applications to handle older kernels in a much nicer way without >> >> >> > extra >> >> >> > dependency on iproute2. We have a similar situation with kprobe, >> >> >> > where >> >> >> > currently libbpf only supports "modern" fd-based attachment, but >> >> >> > users >> >> >> > periodically ask questions and struggle to figure out issues on older >> >> >> > kernels that don't support new APIs. >> >> >> >> >> >> +1; I am OK with adding a new bpf_link-based way to attach TC programs, >> >> >> but we still need to support the netlink API in libbpf. >> >> >> >> >> >> > So I think we'd have to support legacy TC APIs, but I agree with >> >> >> > Alexei and Daniel that we should keep it to the simplest and most >> >> >> > straightforward API of supporting direction-action attachments and >> >> >> > setting up qdisc transparently (if I'm getting all the terminology >> >> >> > right, after reading Quentin's blog post). That coincidentally should >> >> >> > probably match how bpf_link-based TC API will look like, so all that >> >> >> > can be abstracted behind a single bpf_link__attach_tc() API as well, >> >> >> > right? That's the plan for dealing with kprobe right now, btw. Libbpf >> >> >> > will detect the best available API and transparently fall back (maybe >> >> >> > with some warning for awareness, due to inherent downsides of legacy >> >> >> > APIs: no auto-cleanup being the most prominent one). >> >> >> >> >> >> Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the >> >> >> high-level API auto-detect. That way users can also still use the >> >> >> netlink attach function if they don't want the fd-based auto-close >> >> >> behaviour of bpf_link. >> >> > >> >> > So I thought a bit more about this, and it feels like the right move >> >> > would be to expose only higher-level TC BPF API behind bpf_link. It >> >> > will keep the API complexity and amount of APIs that libbpf will have >> >> > to suppo
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Wed, Apr 14, 2021 at 4:32 PM Daniel Borkmann wrote: > > On 4/15/21 1:19 AM, Andrii Nakryiko wrote: > > On Wed, Apr 14, 2021 at 3:51 PM Toke Høiland-Jørgensen > > wrote: > >> Andrii Nakryiko writes: > >>> On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen > >>> wrote: > Andrii Nakryiko writes: > > On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen > > wrote: > >> Andrii Nakryiko writes: > >>> On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov > >>> wrote: > On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi > wrote: > > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: > >> On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi > >> wrote: > >>> [...] > >> > >> All of these things are messy because of tc legacy. bpf tried to > >> follow tc style > >> with cls and act distinction and it didn't quite work. cls with > >> direct-action is the only > >> thing that became mainstream while tc style attach wasn't really > >> addressed. > >> There were several incidents where tc had tens of thousands of > >> progs attached > >> because of this attach/query/index weirdness described above. > >> I think the only way to address this properly is to introduce > >> bpf_link style of > >> attaching to tc. Such bpf_link would support ingress/egress only. > >> direction-action will be implied. There won't be any index and > >> query > >> will be obvious. > > > > Note that we already have bpf_link support working (without support > > for pinning > > ofcourse) in a limited way. The ifindex, protocol, parent_id, > > priority, handle, > > chain_index tuple uniquely identifies a filter, so we stash this in > > the bpf_link > > and are able to operate on the exact filter during release. > > Except they're not unique. The library can stash them, but something > else > doing detach via iproute2 or their own netlink calls will detach the > prog. > This other app can attach to the same spot a different prog and now > bpf_link__destroy will be detaching somebody else prog. > > >> So I would like to propose to take this patch set a step further > >> from > >> what Daniel said: > >> int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > >> and make this proposed api to return FD. > >> To detach from tc ingress/egress just close(fd). > > > > You mean adding an fd-based TC API to the kernel? > > yes. > >>> > >>> I'm totally for bpf_link-based TC attachment. > >>> > >>> But I think *also* having "legacy" netlink-based APIs will allow > >>> applications to handle older kernels in a much nicer way without extra > >>> dependency on iproute2. We have a similar situation with kprobe, where > >>> currently libbpf only supports "modern" fd-based attachment, but users > >>> periodically ask questions and struggle to figure out issues on older > >>> kernels that don't support new APIs. > >> > >> +1; I am OK with adding a new bpf_link-based way to attach TC programs, > >> but we still need to support the netlink API in libbpf. > >> > >>> So I think we'd have to support legacy TC APIs, but I agree with > >>> Alexei and Daniel that we should keep it to the simplest and most > >>> straightforward API of supporting direction-action attachments and > >>> setting up qdisc transparently (if I'm getting all the terminology > >>> right, after reading Quentin's blog post). That coincidentally should > >>> probably match how bpf_link-based TC API will look like, so all that > >>> can be abstracted behind a single bpf_link__attach_tc() API as well, > >>> right? That's the plan for dealing with kprobe right now, btw. Libbpf > >>> will detect the best available API and transparently fall back (maybe > >>> with some warning for awareness, due to inherent downsides of legacy > >>> APIs: no auto-cleanup being the most prominent one). > >> > >> Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the > >> high-level API auto-detect. That way users can also still use the > >> netlink attach function if they don't want the fd-based auto-close > >> behaviour of bpf_link. > > > > So I thought a bit more about this, and it feels like the right move > > would be to expose only higher-level TC BPF API behind bpf_link. It > > will keep the API complexity and amount of APIs that libbpf will have > > to support to the minimum, and will keep the API itself simple: > > direct-attach with the minimum amount of input arguments. By not > > exposing l
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On 4/15/21 1:19 AM, Andrii Nakryiko wrote: On Wed, Apr 14, 2021 at 3:51 PM Toke Høiland-Jørgensen wrote: Andrii Nakryiko writes: On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen wrote: Andrii Nakryiko writes: On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen wrote: Andrii Nakryiko writes: On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov wrote: On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi wrote: On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi wrote: [...] All of these things are messy because of tc legacy. bpf tried to follow tc style with cls and act distinction and it didn't quite work. cls with direct-action is the only thing that became mainstream while tc style attach wasn't really addressed. There were several incidents where tc had tens of thousands of progs attached because of this attach/query/index weirdness described above. I think the only way to address this properly is to introduce bpf_link style of attaching to tc. Such bpf_link would support ingress/egress only. direction-action will be implied. There won't be any index and query will be obvious. Note that we already have bpf_link support working (without support for pinning ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, handle, chain_index tuple uniquely identifies a filter, so we stash this in the bpf_link and are able to operate on the exact filter during release. Except they're not unique. The library can stash them, but something else doing detach via iproute2 or their own netlink calls will detach the prog. This other app can attach to the same spot a different prog and now bpf_link__destroy will be detaching somebody else prog. So I would like to propose to take this patch set a step further from what Daniel said: int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): and make this proposed api to return FD. To detach from tc ingress/egress just close(fd). You mean adding an fd-based TC API to the kernel? yes. I'm totally for bpf_link-based TC attachment. But I think *also* having "legacy" netlink-based APIs will allow applications to handle older kernels in a much nicer way without extra dependency on iproute2. We have a similar situation with kprobe, where currently libbpf only supports "modern" fd-based attachment, but users periodically ask questions and struggle to figure out issues on older kernels that don't support new APIs. +1; I am OK with adding a new bpf_link-based way to attach TC programs, but we still need to support the netlink API in libbpf. So I think we'd have to support legacy TC APIs, but I agree with Alexei and Daniel that we should keep it to the simplest and most straightforward API of supporting direction-action attachments and setting up qdisc transparently (if I'm getting all the terminology right, after reading Quentin's blog post). That coincidentally should probably match how bpf_link-based TC API will look like, so all that can be abstracted behind a single bpf_link__attach_tc() API as well, right? That's the plan for dealing with kprobe right now, btw. Libbpf will detect the best available API and transparently fall back (maybe with some warning for awareness, due to inherent downsides of legacy APIs: no auto-cleanup being the most prominent one). Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the high-level API auto-detect. That way users can also still use the netlink attach function if they don't want the fd-based auto-close behaviour of bpf_link. So I thought a bit more about this, and it feels like the right move would be to expose only higher-level TC BPF API behind bpf_link. It will keep the API complexity and amount of APIs that libbpf will have to support to the minimum, and will keep the API itself simple: direct-attach with the minimum amount of input arguments. By not exposing low-level APIs we also table the whole bpf_tc_cls_attach_id design discussion, as we now can keep as much info as needed inside bpf_link_tc (which will embed bpf_link internally as well) to support detachment and possibly some additional querying, if needed. But then there would be no way for the caller to explicitly select a mechanism? I.e., if I write a BPF program using this mechanism targeting a 5.12 kernel, I'll get netlink attachment, which can stick around when I do bpf_link__disconnect(). But then if the kernel gets upgraded to support bpf_link for TC programs I'll suddenly transparently get bpf_link and the attachments will go away unless I pin them. This seems... less than ideal? That's what we are doing with bpf_program__attach_kprobe(), though. And so far I've only seen people (privately) saying how good it would be to have bpf_link-based TC APIs, doesn't seem like anyone with a realistic use case prefers the current APIs. So I suspect it's not going to be a problem in practice. But at least I'd start there and see
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Wed, Apr 14, 2021 at 3:51 PM Toke Høiland-Jørgensen wrote: > > Andrii Nakryiko writes: > > > On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen > > wrote: > >> > >> Andrii Nakryiko writes: > >> > >> > On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen > >> > wrote: > >> >> > >> >> Andrii Nakryiko writes: > >> >> > >> >> > On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov > >> >> > wrote: > >> >> >> > >> >> >> On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi > >> >> >> wrote: > >> >> >> > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: > >> >> >> > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi > >> >> >> > > wrote: > >> >> >> > > > [...] > >> >> >> > > > >> >> >> > > All of these things are messy because of tc legacy. bpf tried to > >> >> >> > > follow tc style > >> >> >> > > with cls and act distinction and it didn't quite work. cls with > >> >> >> > > direct-action is the only > >> >> >> > > thing that became mainstream while tc style attach wasn't really > >> >> >> > > addressed. > >> >> >> > > There were several incidents where tc had tens of thousands of > >> >> >> > > progs attached > >> >> >> > > because of this attach/query/index weirdness described above. > >> >> >> > > I think the only way to address this properly is to introduce > >> >> >> > > bpf_link style of > >> >> >> > > attaching to tc. Such bpf_link would support ingress/egress only. > >> >> >> > > direction-action will be implied. There won't be any index and > >> >> >> > > query > >> >> >> > > will be obvious. > >> >> >> > > >> >> >> > Note that we already have bpf_link support working (without > >> >> >> > support for pinning > >> >> >> > ofcourse) in a limited way. The ifindex, protocol, parent_id, > >> >> >> > priority, handle, > >> >> >> > chain_index tuple uniquely identifies a filter, so we stash this > >> >> >> > in the bpf_link > >> >> >> > and are able to operate on the exact filter during release. > >> >> >> > >> >> >> Except they're not unique. The library can stash them, but something > >> >> >> else > >> >> >> doing detach via iproute2 or their own netlink calls will detach the > >> >> >> prog. > >> >> >> This other app can attach to the same spot a different prog and now > >> >> >> bpf_link__destroy will be detaching somebody else prog. > >> >> >> > >> >> >> > > So I would like to propose to take this patch set a step further > >> >> >> > > from > >> >> >> > > what Daniel said: > >> >> >> > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > >> >> >> > > and make this proposed api to return FD. > >> >> >> > > To detach from tc ingress/egress just close(fd). > >> >> >> > > >> >> >> > You mean adding an fd-based TC API to the kernel? > >> >> >> > >> >> >> yes. > >> >> > > >> >> > I'm totally for bpf_link-based TC attachment. > >> >> > > >> >> > But I think *also* having "legacy" netlink-based APIs will allow > >> >> > applications to handle older kernels in a much nicer way without extra > >> >> > dependency on iproute2. We have a similar situation with kprobe, where > >> >> > currently libbpf only supports "modern" fd-based attachment, but users > >> >> > periodically ask questions and struggle to figure out issues on older > >> >> > kernels that don't support new APIs. > >> >> > >> >> +1; I am OK with adding a new bpf_link-based way to attach TC programs, > >> >> but we still need to support the netlink API in libbpf. > >> >> > >> >> > So I think we'd have to support legacy TC APIs, but I agree with > >> >> > Alexei and Daniel that we should keep it to the simplest and most > >> >> > straightforward API of supporting direction-action attachments and > >> >> > setting up qdisc transparently (if I'm getting all the terminology > >> >> > right, after reading Quentin's blog post). That coincidentally should > >> >> > probably match how bpf_link-based TC API will look like, so all that > >> >> > can be abstracted behind a single bpf_link__attach_tc() API as well, > >> >> > right? That's the plan for dealing with kprobe right now, btw. Libbpf > >> >> > will detect the best available API and transparently fall back (maybe > >> >> > with some warning for awareness, due to inherent downsides of legacy > >> >> > APIs: no auto-cleanup being the most prominent one). > >> >> > >> >> Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the > >> >> high-level API auto-detect. That way users can also still use the > >> >> netlink attach function if they don't want the fd-based auto-close > >> >> behaviour of bpf_link. > >> > > >> > So I thought a bit more about this, and it feels like the right move > >> > would be to expose only higher-level TC BPF API behind bpf_link. It > >> > will keep the API complexity and amount of APIs that libbpf will have > >> > to support to the minimum, and will keep the API itself simple: > >> > direct-attach with the minimum amount of input arguments. By not > >> > exposing low-level APIs we also table the whole
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
Andrii Nakryiko writes: > On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen > wrote: >> >> Andrii Nakryiko writes: >> >> > On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen >> > wrote: >> >> >> >> Andrii Nakryiko writes: >> >> >> >> > On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov >> >> > wrote: >> >> >> >> >> >> On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi >> >> >> wrote: >> >> >> > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: >> >> >> > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi >> >> >> > > wrote: >> >> >> > > > [...] >> >> >> > > >> >> >> > > All of these things are messy because of tc legacy. bpf tried to >> >> >> > > follow tc style >> >> >> > > with cls and act distinction and it didn't quite work. cls with >> >> >> > > direct-action is the only >> >> >> > > thing that became mainstream while tc style attach wasn't really >> >> >> > > addressed. >> >> >> > > There were several incidents where tc had tens of thousands of >> >> >> > > progs attached >> >> >> > > because of this attach/query/index weirdness described above. >> >> >> > > I think the only way to address this properly is to introduce >> >> >> > > bpf_link style of >> >> >> > > attaching to tc. Such bpf_link would support ingress/egress only. >> >> >> > > direction-action will be implied. There won't be any index and >> >> >> > > query >> >> >> > > will be obvious. >> >> >> > >> >> >> > Note that we already have bpf_link support working (without support >> >> >> > for pinning >> >> >> > ofcourse) in a limited way. The ifindex, protocol, parent_id, >> >> >> > priority, handle, >> >> >> > chain_index tuple uniquely identifies a filter, so we stash this in >> >> >> > the bpf_link >> >> >> > and are able to operate on the exact filter during release. >> >> >> >> >> >> Except they're not unique. The library can stash them, but something >> >> >> else >> >> >> doing detach via iproute2 or their own netlink calls will detach the >> >> >> prog. >> >> >> This other app can attach to the same spot a different prog and now >> >> >> bpf_link__destroy will be detaching somebody else prog. >> >> >> >> >> >> > > So I would like to propose to take this patch set a step further >> >> >> > > from >> >> >> > > what Daniel said: >> >> >> > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): >> >> >> > > and make this proposed api to return FD. >> >> >> > > To detach from tc ingress/egress just close(fd). >> >> >> > >> >> >> > You mean adding an fd-based TC API to the kernel? >> >> >> >> >> >> yes. >> >> > >> >> > I'm totally for bpf_link-based TC attachment. >> >> > >> >> > But I think *also* having "legacy" netlink-based APIs will allow >> >> > applications to handle older kernels in a much nicer way without extra >> >> > dependency on iproute2. We have a similar situation with kprobe, where >> >> > currently libbpf only supports "modern" fd-based attachment, but users >> >> > periodically ask questions and struggle to figure out issues on older >> >> > kernels that don't support new APIs. >> >> >> >> +1; I am OK with adding a new bpf_link-based way to attach TC programs, >> >> but we still need to support the netlink API in libbpf. >> >> >> >> > So I think we'd have to support legacy TC APIs, but I agree with >> >> > Alexei and Daniel that we should keep it to the simplest and most >> >> > straightforward API of supporting direction-action attachments and >> >> > setting up qdisc transparently (if I'm getting all the terminology >> >> > right, after reading Quentin's blog post). That coincidentally should >> >> > probably match how bpf_link-based TC API will look like, so all that >> >> > can be abstracted behind a single bpf_link__attach_tc() API as well, >> >> > right? That's the plan for dealing with kprobe right now, btw. Libbpf >> >> > will detect the best available API and transparently fall back (maybe >> >> > with some warning for awareness, due to inherent downsides of legacy >> >> > APIs: no auto-cleanup being the most prominent one). >> >> >> >> Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the >> >> high-level API auto-detect. That way users can also still use the >> >> netlink attach function if they don't want the fd-based auto-close >> >> behaviour of bpf_link. >> > >> > So I thought a bit more about this, and it feels like the right move >> > would be to expose only higher-level TC BPF API behind bpf_link. It >> > will keep the API complexity and amount of APIs that libbpf will have >> > to support to the minimum, and will keep the API itself simple: >> > direct-attach with the minimum amount of input arguments. By not >> > exposing low-level APIs we also table the whole bpf_tc_cls_attach_id >> > design discussion, as we now can keep as much info as needed inside >> > bpf_link_tc (which will embed bpf_link internally as well) to support >> > detachment and possibly some additional querying, if needed. >> >> But then there would
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Wed, Apr 14, 2021 at 3:58 AM Toke Høiland-Jørgensen wrote: > > Andrii Nakryiko writes: > > > On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen > > wrote: > >> > >> Andrii Nakryiko writes: > >> > >> > On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov > >> > wrote: > >> >> > >> >> On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi wrote: > >> >> > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: > >> >> > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi > >> >> > > wrote: > >> >> > > > [...] > >> >> > > > >> >> > > All of these things are messy because of tc legacy. bpf tried to > >> >> > > follow tc style > >> >> > > with cls and act distinction and it didn't quite work. cls with > >> >> > > direct-action is the only > >> >> > > thing that became mainstream while tc style attach wasn't really > >> >> > > addressed. > >> >> > > There were several incidents where tc had tens of thousands of > >> >> > > progs attached > >> >> > > because of this attach/query/index weirdness described above. > >> >> > > I think the only way to address this properly is to introduce > >> >> > > bpf_link style of > >> >> > > attaching to tc. Such bpf_link would support ingress/egress only. > >> >> > > direction-action will be implied. There won't be any index and query > >> >> > > will be obvious. > >> >> > > >> >> > Note that we already have bpf_link support working (without support > >> >> > for pinning > >> >> > ofcourse) in a limited way. The ifindex, protocol, parent_id, > >> >> > priority, handle, > >> >> > chain_index tuple uniquely identifies a filter, so we stash this in > >> >> > the bpf_link > >> >> > and are able to operate on the exact filter during release. > >> >> > >> >> Except they're not unique. The library can stash them, but something > >> >> else > >> >> doing detach via iproute2 or their own netlink calls will detach the > >> >> prog. > >> >> This other app can attach to the same spot a different prog and now > >> >> bpf_link__destroy will be detaching somebody else prog. > >> >> > >> >> > > So I would like to propose to take this patch set a step further > >> >> > > from > >> >> > > what Daniel said: > >> >> > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > >> >> > > and make this proposed api to return FD. > >> >> > > To detach from tc ingress/egress just close(fd). > >> >> > > >> >> > You mean adding an fd-based TC API to the kernel? > >> >> > >> >> yes. > >> > > >> > I'm totally for bpf_link-based TC attachment. > >> > > >> > But I think *also* having "legacy" netlink-based APIs will allow > >> > applications to handle older kernels in a much nicer way without extra > >> > dependency on iproute2. We have a similar situation with kprobe, where > >> > currently libbpf only supports "modern" fd-based attachment, but users > >> > periodically ask questions and struggle to figure out issues on older > >> > kernels that don't support new APIs. > >> > >> +1; I am OK with adding a new bpf_link-based way to attach TC programs, > >> but we still need to support the netlink API in libbpf. > >> > >> > So I think we'd have to support legacy TC APIs, but I agree with > >> > Alexei and Daniel that we should keep it to the simplest and most > >> > straightforward API of supporting direction-action attachments and > >> > setting up qdisc transparently (if I'm getting all the terminology > >> > right, after reading Quentin's blog post). That coincidentally should > >> > probably match how bpf_link-based TC API will look like, so all that > >> > can be abstracted behind a single bpf_link__attach_tc() API as well, > >> > right? That's the plan for dealing with kprobe right now, btw. Libbpf > >> > will detect the best available API and transparently fall back (maybe > >> > with some warning for awareness, due to inherent downsides of legacy > >> > APIs: no auto-cleanup being the most prominent one). > >> > >> Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the > >> high-level API auto-detect. That way users can also still use the > >> netlink attach function if they don't want the fd-based auto-close > >> behaviour of bpf_link. > > > > So I thought a bit more about this, and it feels like the right move > > would be to expose only higher-level TC BPF API behind bpf_link. It > > will keep the API complexity and amount of APIs that libbpf will have > > to support to the minimum, and will keep the API itself simple: > > direct-attach with the minimum amount of input arguments. By not > > exposing low-level APIs we also table the whole bpf_tc_cls_attach_id > > design discussion, as we now can keep as much info as needed inside > > bpf_link_tc (which will embed bpf_link internally as well) to support > > detachment and possibly some additional querying, if needed. > > But then there would be no way for the caller to explicitly select a > mechanism? I.e., if I write a BPF program using this mechanism targeting > a 5.12 kernel, I'll get net
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
Andrii Nakryiko writes: > On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen wrote: >> >> Andrii Nakryiko writes: >> >> > On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov >> > wrote: >> >> >> >> On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi wrote: >> >> > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: >> >> > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi >> >> > > wrote: >> >> > > > [...] >> >> > > >> >> > > All of these things are messy because of tc legacy. bpf tried to >> >> > > follow tc style >> >> > > with cls and act distinction and it didn't quite work. cls with >> >> > > direct-action is the only >> >> > > thing that became mainstream while tc style attach wasn't really >> >> > > addressed. >> >> > > There were several incidents where tc had tens of thousands of progs >> >> > > attached >> >> > > because of this attach/query/index weirdness described above. >> >> > > I think the only way to address this properly is to introduce >> >> > > bpf_link style of >> >> > > attaching to tc. Such bpf_link would support ingress/egress only. >> >> > > direction-action will be implied. There won't be any index and query >> >> > > will be obvious. >> >> > >> >> > Note that we already have bpf_link support working (without support for >> >> > pinning >> >> > ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, >> >> > handle, >> >> > chain_index tuple uniquely identifies a filter, so we stash this in the >> >> > bpf_link >> >> > and are able to operate on the exact filter during release. >> >> >> >> Except they're not unique. The library can stash them, but something else >> >> doing detach via iproute2 or their own netlink calls will detach the prog. >> >> This other app can attach to the same spot a different prog and now >> >> bpf_link__destroy will be detaching somebody else prog. >> >> >> >> > > So I would like to propose to take this patch set a step further from >> >> > > what Daniel said: >> >> > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): >> >> > > and make this proposed api to return FD. >> >> > > To detach from tc ingress/egress just close(fd). >> >> > >> >> > You mean adding an fd-based TC API to the kernel? >> >> >> >> yes. >> > >> > I'm totally for bpf_link-based TC attachment. >> > >> > But I think *also* having "legacy" netlink-based APIs will allow >> > applications to handle older kernels in a much nicer way without extra >> > dependency on iproute2. We have a similar situation with kprobe, where >> > currently libbpf only supports "modern" fd-based attachment, but users >> > periodically ask questions and struggle to figure out issues on older >> > kernels that don't support new APIs. >> >> +1; I am OK with adding a new bpf_link-based way to attach TC programs, >> but we still need to support the netlink API in libbpf. >> >> > So I think we'd have to support legacy TC APIs, but I agree with >> > Alexei and Daniel that we should keep it to the simplest and most >> > straightforward API of supporting direction-action attachments and >> > setting up qdisc transparently (if I'm getting all the terminology >> > right, after reading Quentin's blog post). That coincidentally should >> > probably match how bpf_link-based TC API will look like, so all that >> > can be abstracted behind a single bpf_link__attach_tc() API as well, >> > right? That's the plan for dealing with kprobe right now, btw. Libbpf >> > will detect the best available API and transparently fall back (maybe >> > with some warning for awareness, due to inherent downsides of legacy >> > APIs: no auto-cleanup being the most prominent one). >> >> Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the >> high-level API auto-detect. That way users can also still use the >> netlink attach function if they don't want the fd-based auto-close >> behaviour of bpf_link. > > So I thought a bit more about this, and it feels like the right move > would be to expose only higher-level TC BPF API behind bpf_link. It > will keep the API complexity and amount of APIs that libbpf will have > to support to the minimum, and will keep the API itself simple: > direct-attach with the minimum amount of input arguments. By not > exposing low-level APIs we also table the whole bpf_tc_cls_attach_id > design discussion, as we now can keep as much info as needed inside > bpf_link_tc (which will embed bpf_link internally as well) to support > detachment and possibly some additional querying, if needed. But then there would be no way for the caller to explicitly select a mechanism? I.e., if I write a BPF program using this mechanism targeting a 5.12 kernel, I'll get netlink attachment, which can stick around when I do bpf_link__disconnect(). But then if the kernel gets upgraded to support bpf_link for TC programs I'll suddenly transparently get bpf_link and the attachments will go away unless I pin them. This seems... less than ideal? If we expose the
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen wrote: > > Andrii Nakryiko writes: > > > On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov > > wrote: > >> > >> On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi wrote: > >> > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: > >> > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi > >> > > wrote: > >> > > > [...] > >> > > > >> > > All of these things are messy because of tc legacy. bpf tried to > >> > > follow tc style > >> > > with cls and act distinction and it didn't quite work. cls with > >> > > direct-action is the only > >> > > thing that became mainstream while tc style attach wasn't really > >> > > addressed. > >> > > There were several incidents where tc had tens of thousands of progs > >> > > attached > >> > > because of this attach/query/index weirdness described above. > >> > > I think the only way to address this properly is to introduce bpf_link > >> > > style of > >> > > attaching to tc. Such bpf_link would support ingress/egress only. > >> > > direction-action will be implied. There won't be any index and query > >> > > will be obvious. > >> > > >> > Note that we already have bpf_link support working (without support for > >> > pinning > >> > ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, > >> > handle, > >> > chain_index tuple uniquely identifies a filter, so we stash this in the > >> > bpf_link > >> > and are able to operate on the exact filter during release. > >> > >> Except they're not unique. The library can stash them, but something else > >> doing detach via iproute2 or their own netlink calls will detach the prog. > >> This other app can attach to the same spot a different prog and now > >> bpf_link__destroy will be detaching somebody else prog. > >> > >> > > So I would like to propose to take this patch set a step further from > >> > > what Daniel said: > >> > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > >> > > and make this proposed api to return FD. > >> > > To detach from tc ingress/egress just close(fd). > >> > > >> > You mean adding an fd-based TC API to the kernel? > >> > >> yes. > > > > I'm totally for bpf_link-based TC attachment. > > > > But I think *also* having "legacy" netlink-based APIs will allow > > applications to handle older kernels in a much nicer way without extra > > dependency on iproute2. We have a similar situation with kprobe, where > > currently libbpf only supports "modern" fd-based attachment, but users > > periodically ask questions and struggle to figure out issues on older > > kernels that don't support new APIs. > > +1; I am OK with adding a new bpf_link-based way to attach TC programs, > but we still need to support the netlink API in libbpf. > > > So I think we'd have to support legacy TC APIs, but I agree with > > Alexei and Daniel that we should keep it to the simplest and most > > straightforward API of supporting direction-action attachments and > > setting up qdisc transparently (if I'm getting all the terminology > > right, after reading Quentin's blog post). That coincidentally should > > probably match how bpf_link-based TC API will look like, so all that > > can be abstracted behind a single bpf_link__attach_tc() API as well, > > right? That's the plan for dealing with kprobe right now, btw. Libbpf > > will detect the best available API and transparently fall back (maybe > > with some warning for awareness, due to inherent downsides of legacy > > APIs: no auto-cleanup being the most prominent one). > > Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the > high-level API auto-detect. That way users can also still use the > netlink attach function if they don't want the fd-based auto-close > behaviour of bpf_link. So I thought a bit more about this, and it feels like the right move would be to expose only higher-level TC BPF API behind bpf_link. It will keep the API complexity and amount of APIs that libbpf will have to support to the minimum, and will keep the API itself simple: direct-attach with the minimum amount of input arguments. By not exposing low-level APIs we also table the whole bpf_tc_cls_attach_id design discussion, as we now can keep as much info as needed inside bpf_link_tc (which will embed bpf_link internally as well) to support detachment and possibly some additional querying, if needed. I think that's the best and least controversial step forward for getting this API into libbpf. > > -Toke >
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Mon, Apr 05, 2021 at 10:51:09PM IST, Andrii Nakryiko wrote: > > [...] > > if _block variant is just a special ifindex value, then it should be > fine for users to know such a detail (we can leave a comment > mentioning this specifically), especially given it's not a very > popular thing. Almost doubling amount of APIs just for this doesn't > make much sense, IMO. > Ok. > > If we know that we need variant with options, I'd vote for having just > one bpf_tc_attach() API which always takes options. Passing NULL for > opts is simple, no need for two APIs, I think. > Ack. > > Which parts of that id struct is the data that caller might not know > or can't know? Is it handle and chain_index? Or just one of them? > Or?... If there is something that has to be returned back, I'd keep > only that, instead of returning 6+ fields, most of which user should > already know. > The user will know ifindex and parent_id, and perhaps protocol (it would be ETH_P_ALL if they don't supply one by default). Other fields like handle, priority and chain_index can all be kernel assigned, so keeping those still makes sense. I'll change this in v2. -- Kartikeya
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
Andrii Nakryiko writes: > On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov > wrote: >> >> On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi wrote: >> > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: >> > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi >> > > wrote: >> > > > [...] >> > > >> > > All of these things are messy because of tc legacy. bpf tried to follow >> > > tc style >> > > with cls and act distinction and it didn't quite work. cls with >> > > direct-action is the only >> > > thing that became mainstream while tc style attach wasn't really >> > > addressed. >> > > There were several incidents where tc had tens of thousands of progs >> > > attached >> > > because of this attach/query/index weirdness described above. >> > > I think the only way to address this properly is to introduce bpf_link >> > > style of >> > > attaching to tc. Such bpf_link would support ingress/egress only. >> > > direction-action will be implied. There won't be any index and query >> > > will be obvious. >> > >> > Note that we already have bpf_link support working (without support for >> > pinning >> > ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, >> > handle, >> > chain_index tuple uniquely identifies a filter, so we stash this in the >> > bpf_link >> > and are able to operate on the exact filter during release. >> >> Except they're not unique. The library can stash them, but something else >> doing detach via iproute2 or their own netlink calls will detach the prog. >> This other app can attach to the same spot a different prog and now >> bpf_link__destroy will be detaching somebody else prog. >> >> > > So I would like to propose to take this patch set a step further from >> > > what Daniel said: >> > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): >> > > and make this proposed api to return FD. >> > > To detach from tc ingress/egress just close(fd). >> > >> > You mean adding an fd-based TC API to the kernel? >> >> yes. > > I'm totally for bpf_link-based TC attachment. > > But I think *also* having "legacy" netlink-based APIs will allow > applications to handle older kernels in a much nicer way without extra > dependency on iproute2. We have a similar situation with kprobe, where > currently libbpf only supports "modern" fd-based attachment, but users > periodically ask questions and struggle to figure out issues on older > kernels that don't support new APIs. +1; I am OK with adding a new bpf_link-based way to attach TC programs, but we still need to support the netlink API in libbpf. > So I think we'd have to support legacy TC APIs, but I agree with > Alexei and Daniel that we should keep it to the simplest and most > straightforward API of supporting direction-action attachments and > setting up qdisc transparently (if I'm getting all the terminology > right, after reading Quentin's blog post). That coincidentally should > probably match how bpf_link-based TC API will look like, so all that > can be abstracted behind a single bpf_link__attach_tc() API as well, > right? That's the plan for dealing with kprobe right now, btw. Libbpf > will detect the best available API and transparently fall back (maybe > with some warning for awareness, due to inherent downsides of legacy > APIs: no auto-cleanup being the most prominent one). Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the high-level API auto-detect. That way users can also still use the netlink attach function if they don't want the fd-based auto-close behaviour of bpf_link. -Toke
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov wrote: > > On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi wrote: > > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: > > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi > > > wrote: > > > > [...] > > > > > > All of these things are messy because of tc legacy. bpf tried to follow > > > tc style > > > with cls and act distinction and it didn't quite work. cls with > > > direct-action is the only > > > thing that became mainstream while tc style attach wasn't really > > > addressed. > > > There were several incidents where tc had tens of thousands of progs > > > attached > > > because of this attach/query/index weirdness described above. > > > I think the only way to address this properly is to introduce bpf_link > > > style of > > > attaching to tc. Such bpf_link would support ingress/egress only. > > > direction-action will be implied. There won't be any index and query > > > will be obvious. > > > > Note that we already have bpf_link support working (without support for > > pinning > > ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, > > handle, > > chain_index tuple uniquely identifies a filter, so we stash this in the > > bpf_link > > and are able to operate on the exact filter during release. > > Except they're not unique. The library can stash them, but something else > doing detach via iproute2 or their own netlink calls will detach the prog. > This other app can attach to the same spot a different prog and now > bpf_link__destroy will be detaching somebody else prog. > > > > So I would like to propose to take this patch set a step further from > > > what Daniel said: > > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > > > and make this proposed api to return FD. > > > To detach from tc ingress/egress just close(fd). > > > > You mean adding an fd-based TC API to the kernel? > > yes. I'm totally for bpf_link-based TC attachment. But I think *also* having "legacy" netlink-based APIs will allow applications to handle older kernels in a much nicer way without extra dependency on iproute2. We have a similar situation with kprobe, where currently libbpf only supports "modern" fd-based attachment, but users periodically ask questions and struggle to figure out issues on older kernels that don't support new APIs. So I think we'd have to support legacy TC APIs, but I agree with Alexei and Daniel that we should keep it to the simplest and most straightforward API of supporting direction-action attachments and setting up qdisc transparently (if I'm getting all the terminology right, after reading Quentin's blog post). That coincidentally should probably match how bpf_link-based TC API will look like, so all that can be abstracted behind a single bpf_link__attach_tc() API as well, right? That's the plan for dealing with kprobe right now, btw. Libbpf will detect the best available API and transparently fall back (maybe with some warning for awareness, due to inherent downsides of legacy APIs: no auto-cleanup being the most prominent one). > > > > The user processes will not conflict with each other and will not > > > accidently > > > detach bpf program that was attached by another user process. > > > Such api will address the existing tc query/attach/detach race race > > > conditions. > > > > Hmm, I think we do solve the race condition by returning the id. As long as > > you > > don't misuse the interface and go around deleting filters arbitrarily (i.e. > > only > > detach using the id), programs won't step over each other's filters. > > Storing the > > id from the netlink response received during detach also eliminates any > > ambigiuity from probing through get_info after attach. Same goes for > > actions, > > and the same applies to the bpf_link returning API (which stashes id/index). > > There are plenty of tools and libraries out there that do attach/detach of bpf > to tc. Everyone is not going to convert to this new libbpf api overnight. > So 'miuse of the interface' is not a misuse. It's a reality that is going to > keep > happening unless the kernel guarantees ownership of the attachment via FD. > > > The only advantage of fd would be the possibility of pinning it, and > > extending > > lifetime of the filter. > > Pinning is one of the advantages. The main selling point of FD is ownership > of the attachment.
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi wrote: > > On Fri, Apr 02, 2021 at 05:49:29AM IST, Daniel Borkmann wrote: > > On 3/31/21 11:44 AM, Kumar Kartikeya Dwivedi wrote: > > > On Wed, Mar 31, 2021 at 02:55:47AM IST, Daniel Borkmann wrote: > > > > Do we even need the _block variant? I would rather prefer to take the > > > > chance > > > > and make it as simple as possible, and only iff really needed extend > > > > with > > > > other APIs, for example: > > > > > > The block variant can be dropped, I'll use the TC_BLOCK/TC_DEV > > > alternative which > > > sets parent_id/ifindex properly. > > > > > > >bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}); > > > > > > > > Internally, this will create the sch_clsact qdisc & cls_bpf filter > > > > instance > > > > iff not present yet, and attach to a default prio 1 handle 1, and > > > > _always_ in > > > > direct-action mode. This is /as simple as it gets/ and we don't need to > > > > bother > > > > users with more complex tc/cls_bpf internals unless desired. For > > > > example, > > > > extended APIs could add prio/parent so that multi-prog can be attached > > > > to a > > > > single cls_bpf instance, but even that could be a second step, imho. > > > > > > I am not opposed to clsact qdisc setup if INGRESS/EGRESS is supplied (not > > > sure > > > how others feel about it). > > > > What speaks against it? It would be 100% clear from API side where the prog > > is > > being attached. Same as with tc cmdline where you specify > > 'ingress'/'egress'. > > > > Ok, I will add the qdisc setup in the next revision. > > > > We could make direct_action mode default, and similarly choose prio > > > > To be honest, I wouldn't even support a mode from the lib/API side where > > direct_action > > is not set. It should always be forced to true. Everything else is rather > > broken > > setup-wise, imho, since it won't scale. We added direct_action a bit later > > to the > > kernel than original cls_bpf, but if I would do it again today, I'd make it > > the > > only available option. I don't see a reasonable use-case where you have it > > to false. > > > > I'm all for doing that, but in some sense that also speaks against SCHED_ACT > support. Currently, you can load SCHED_ACT programs using this series, but not > really bind them to classifier. I left that option open to a future patch, it > would just reuse the existing tc_act_add_action helper (also why I kept it in > its own function). Maybe we need to reconsider that, if direct action is the > only recommended way going forward (to discourage people from using > SCHED_ACT), > or just add opts to do all the setup in low level API, instead of leaving it > incomplete. > > > > as 1 by default instead of letting the kernel do it. Then you can just > > > pass in > > > NULL for bpf_tc_cls_opts and be close to what you're proposing. For > > > protocol we > > > can choose ETH_P_ALL by default too if the user doesn't set it. > > > > Same here with ETH_P_ALL, I'm not sure anyone uses anything other than > > ETH_P_ALL, > > so yes, that should be default. > > Ack. > > > > > > With these modifications, the equivalent would look like > > > bpf_tc_cls_attach(prog_fd, TC_DEV(ifindex, INGRESS), NULL, &id); > > > > Few things compared to bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > > > > 1) nit, but why even 'cls' in the name. I think we shouldn't expose such > > old-days > >tc semantics to a user. Just bpf_tc_attach() is cleaner/simpler to > > understand. > > Since it would make it clear this is for SCHED_CLS progs, likewise > bpf_tc_act_* > is for SCHED_ACT progs. Not opposed to changing the name. > > > 2) What's the 'TC_DEV(ifindex, INGRESS)' macro doing exactly? Looks > > unnecessary, > >why not regular args to the API? > > It is very easy to support BLOCK (I know it's not really popular here, but I > think if supporting it just requires adding a macro, then we can go ahead). So > the user can use TC_BLOCK(block_idx) instead of remembering ifindex is to be > set > to TCM_IFINDEX_MAGIC_BLOCK and parent_id to actual block index. It will just > expand to: > > #define TC_BLOCK(block_idx) TCM_IFINDEX_MAGIC_BLOCK, (block_idx) > > TC_DEV macro can be dropped, since user can directly pass ifindex and > parent_id. if _block variant is just a special ifindex value, then it should be fine for users to know such a detail (we can leave a comment mentioning this specifically), especially given it's not a very popular thing. Almost doubling amount of APIs just for this doesn't make much sense, IMO. > > > 3) Exposed bpf_tc_attach() API could internally call a bpf_tc_attach_opts() > > API > >with preset defaults, and the latter could have all the custom bits if > > the user > >needs to go beyond the simple API, so from your bpf_tc_cls_attach() I'd > > also > >drop the NULL. > > Ok, this is probably better (but maybe we can do this for the high-level > bpf_program__attach that ret
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi wrote: > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi > > wrote: > > > [...] > > > > All of these things are messy because of tc legacy. bpf tried to follow tc > > style > > with cls and act distinction and it didn't quite work. cls with > > direct-action is the only > > thing that became mainstream while tc style attach wasn't really addressed. > > There were several incidents where tc had tens of thousands of progs > > attached > > because of this attach/query/index weirdness described above. > > I think the only way to address this properly is to introduce bpf_link > > style of > > attaching to tc. Such bpf_link would support ingress/egress only. > > direction-action will be implied. There won't be any index and query > > will be obvious. > > Note that we already have bpf_link support working (without support for > pinning > ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, > handle, > chain_index tuple uniquely identifies a filter, so we stash this in the > bpf_link > and are able to operate on the exact filter during release. Except they're not unique. The library can stash them, but something else doing detach via iproute2 or their own netlink calls will detach the prog. This other app can attach to the same spot a different prog and now bpf_link__destroy will be detaching somebody else prog. > > So I would like to propose to take this patch set a step further from > > what Daniel said: > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > > and make this proposed api to return FD. > > To detach from tc ingress/egress just close(fd). > > You mean adding an fd-based TC API to the kernel? yes. > > The user processes will not conflict with each other and will not accidently > > detach bpf program that was attached by another user process. > > Such api will address the existing tc query/attach/detach race race > > conditions. > > Hmm, I think we do solve the race condition by returning the id. As long as > you > don't misuse the interface and go around deleting filters arbitrarily (i.e. > only > detach using the id), programs won't step over each other's filters. Storing > the > id from the netlink response received during detach also eliminates any > ambigiuity from probing through get_info after attach. Same goes for actions, > and the same applies to the bpf_link returning API (which stashes id/index). There are plenty of tools and libraries out there that do attach/detach of bpf to tc. Everyone is not going to convert to this new libbpf api overnight. So 'miuse of the interface' is not a misuse. It's a reality that is going to keep happening unless the kernel guarantees ownership of the attachment via FD. > The only advantage of fd would be the possibility of pinning it, and extending > lifetime of the filter. Pinning is one of the advantages. The main selling point of FD is ownership of the attachment.
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi > wrote: > > [...] > > All of these things are messy because of tc legacy. bpf tried to follow tc > style > with cls and act distinction and it didn't quite work. cls with > direct-action is the only > thing that became mainstream while tc style attach wasn't really addressed. > There were several incidents where tc had tens of thousands of progs attached > because of this attach/query/index weirdness described above. > I think the only way to address this properly is to introduce bpf_link style > of > attaching to tc. Such bpf_link would support ingress/egress only. > direction-action will be implied. There won't be any index and query > will be obvious. Note that we already have bpf_link support working (without support for pinning ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, handle, chain_index tuple uniquely identifies a filter, so we stash this in the bpf_link and are able to operate on the exact filter during release. > So I would like to propose to take this patch set a step further from > what Daniel said: > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > and make this proposed api to return FD. > To detach from tc ingress/egress just close(fd). You mean adding an fd-based TC API to the kernel? > The user processes will not conflict with each other and will not accidently > detach bpf program that was attached by another user process. > Such api will address the existing tc query/attach/detach race race > conditions. Hmm, I think we do solve the race condition by returning the id. As long as you don't misuse the interface and go around deleting filters arbitrarily (i.e. only detach using the id), programs won't step over each other's filters. Storing the id from the netlink response received during detach also eliminates any ambigiuity from probing through get_info after attach. Same goes for actions, and the same applies to the bpf_link returning API (which stashes id/index). Do you have any other example that can still be racy given the current API? The only advantage of fd would be the possibility of pinning it, and extending lifetime of the filter. > And libbpf side of support for this api will be trivial. Single bpf > link_create command > with ifindex and ingress|egress arguments. > wdyt? -- Kartikeya
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi wrote: > > This would be fine, because it's not a fast path or anything, but right now we > return the id using the netlink response, otherwise for query we have to open > the socket, prepare the msg, send and recv again. So it's a minor > optimization. > > However, there's one other problem. In an earlier version of this series, I > didn't keep the id/index out parameters (to act as handle to the newly > attached > filter/action). This lead to problems on query. Suppose a user doesn't > properly > fill the opts during query (e.g. in case of filters). This means the netlink > dump includes all filters matching filled in attributes. If the prog_id for > all > of these is same (e.g. all have same bpf classifier prog attached to them), it > becomes impossible to determine which one is the filter user asked for. It is > not possible to enforce filling in all kinds of attributes since some can be > left out and assigned by default in the kernel (priority, chain_index etc.). > So > returning the newly created filter's id turned out to be the best option. This > is also used to stash filter related information in bpf_link to properly > release > it later. > > The same problem happens with actions, where we look up using the prog_id, we > multiple actions with different index can match on same prog_id. It is not > possible to determine which index corresponds to last loaded action. > > So unless there's a better idea on how to deal with this, a query API won't > work > for the case where same bpf prog is attached more than once. Returning the > id/index during attach seemed better than all other options we considered. All of these things are messy because of tc legacy. bpf tried to follow tc style with cls and act distinction and it didn't quite work. cls with direct-action is the only thing that became mainstream while tc style attach wasn't really addressed. There were several incidents where tc had tens of thousands of progs attached because of this attach/query/index weirdness described above. I think the only way to address this properly is to introduce bpf_link style of attaching to tc. Such bpf_link would support ingress/egress only. direction-action will be implied. There won't be any index and query will be obvious. So I would like to propose to take this patch set a step further from what Daniel said: int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): and make this proposed api to return FD. To detach from tc ingress/egress just close(fd). The user processes will not conflict with each other and will not accidently detach bpf program that was attached by another user process. Such api will address the existing tc query/attach/detach race race conditions. And libbpf side of support for this api will be trivial. Single bpf link_create command with ifindex and ingress|egress arguments. wdyt?
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Fri, Apr 02, 2021 at 05:49:29AM IST, Daniel Borkmann wrote: > On 3/31/21 11:44 AM, Kumar Kartikeya Dwivedi wrote: > > On Wed, Mar 31, 2021 at 02:55:47AM IST, Daniel Borkmann wrote: > > > Do we even need the _block variant? I would rather prefer to take the > > > chance > > > and make it as simple as possible, and only iff really needed extend with > > > other APIs, for example: > > > > The block variant can be dropped, I'll use the TC_BLOCK/TC_DEV alternative > > which > > sets parent_id/ifindex properly. > > > > >bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}); > > > > > > Internally, this will create the sch_clsact qdisc & cls_bpf filter > > > instance > > > iff not present yet, and attach to a default prio 1 handle 1, and > > > _always_ in > > > direct-action mode. This is /as simple as it gets/ and we don't need to > > > bother > > > users with more complex tc/cls_bpf internals unless desired. For example, > > > extended APIs could add prio/parent so that multi-prog can be attached to > > > a > > > single cls_bpf instance, but even that could be a second step, imho. > > > > I am not opposed to clsact qdisc setup if INGRESS/EGRESS is supplied (not > > sure > > how others feel about it). > > What speaks against it? It would be 100% clear from API side where the prog is > being attached. Same as with tc cmdline where you specify 'ingress'/'egress'. > Ok, I will add the qdisc setup in the next revision. > > We could make direct_action mode default, and similarly choose prio > > To be honest, I wouldn't even support a mode from the lib/API side where > direct_action > is not set. It should always be forced to true. Everything else is rather > broken > setup-wise, imho, since it won't scale. We added direct_action a bit later to > the > kernel than original cls_bpf, but if I would do it again today, I'd make it > the > only available option. I don't see a reasonable use-case where you have it to > false. > I'm all for doing that, but in some sense that also speaks against SCHED_ACT support. Currently, you can load SCHED_ACT programs using this series, but not really bind them to classifier. I left that option open to a future patch, it would just reuse the existing tc_act_add_action helper (also why I kept it in its own function). Maybe we need to reconsider that, if direct action is the only recommended way going forward (to discourage people from using SCHED_ACT), or just add opts to do all the setup in low level API, instead of leaving it incomplete. > > as 1 by default instead of letting the kernel do it. Then you can just pass > > in > > NULL for bpf_tc_cls_opts and be close to what you're proposing. For > > protocol we > > can choose ETH_P_ALL by default too if the user doesn't set it. > > Same here with ETH_P_ALL, I'm not sure anyone uses anything other than > ETH_P_ALL, > so yes, that should be default. Ack. > > > With these modifications, the equivalent would look like > > bpf_tc_cls_attach(prog_fd, TC_DEV(ifindex, INGRESS), NULL, &id); > > Few things compared to bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > > 1) nit, but why even 'cls' in the name. I think we shouldn't expose such > old-days >tc semantics to a user. Just bpf_tc_attach() is cleaner/simpler to > understand. Since it would make it clear this is for SCHED_CLS progs, likewise bpf_tc_act_* is for SCHED_ACT progs. Not opposed to changing the name. > 2) What's the 'TC_DEV(ifindex, INGRESS)' macro doing exactly? Looks > unnecessary, >why not regular args to the API? It is very easy to support BLOCK (I know it's not really popular here, but I think if supporting it just requires adding a macro, then we can go ahead). So the user can use TC_BLOCK(block_idx) instead of remembering ifindex is to be set to TCM_IFINDEX_MAGIC_BLOCK and parent_id to actual block index. It will just expand to: #define TC_BLOCK(block_idx) TCM_IFINDEX_MAGIC_BLOCK, (block_idx) TC_DEV macro can be dropped, since user can directly pass ifindex and parent_id. > 3) Exposed bpf_tc_attach() API could internally call a bpf_tc_attach_opts() > API >with preset defaults, and the latter could have all the custom bits if the > user >needs to go beyond the simple API, so from your bpf_tc_cls_attach() I'd > also >drop the NULL. Ok, this is probably better (but maybe we can do this for the high-level bpf_program__attach that returns a bpf_link * instead of introducing yet another function). > 4) For the simple API I'd likely also drop the id (you could have a query API > if >needed). > This would be fine, because it's not a fast path or anything, but right now we return the id using the netlink response, otherwise for query we have to open the socket, prepare the msg, send and recv again. So it's a minor optimization. However, there's one other problem. In an earlier version of this series, I didn't keep the id/index out parameters (to act as handle to the newly attached filter/action). This lead to pro
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On 3/31/21 11:44 AM, Kumar Kartikeya Dwivedi wrote: On Wed, Mar 31, 2021 at 02:55:47AM IST, Daniel Borkmann wrote: Do we even need the _block variant? I would rather prefer to take the chance and make it as simple as possible, and only iff really needed extend with other APIs, for example: The block variant can be dropped, I'll use the TC_BLOCK/TC_DEV alternative which sets parent_id/ifindex properly. bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}); Internally, this will create the sch_clsact qdisc & cls_bpf filter instance iff not present yet, and attach to a default prio 1 handle 1, and _always_ in direct-action mode. This is /as simple as it gets/ and we don't need to bother users with more complex tc/cls_bpf internals unless desired. For example, extended APIs could add prio/parent so that multi-prog can be attached to a single cls_bpf instance, but even that could be a second step, imho. I am not opposed to clsact qdisc setup if INGRESS/EGRESS is supplied (not sure how others feel about it). What speaks against it? It would be 100% clear from API side where the prog is being attached. Same as with tc cmdline where you specify 'ingress'/'egress'. We could make direct_action mode default, and similarly choose prio To be honest, I wouldn't even support a mode from the lib/API side where direct_action is not set. It should always be forced to true. Everything else is rather broken setup-wise, imho, since it won't scale. We added direct_action a bit later to the kernel than original cls_bpf, but if I would do it again today, I'd make it the only available option. I don't see a reasonable use-case where you have it to false. as 1 by default instead of letting the kernel do it. Then you can just pass in NULL for bpf_tc_cls_opts and be close to what you're proposing. For protocol we can choose ETH_P_ALL by default too if the user doesn't set it. Same here with ETH_P_ALL, I'm not sure anyone uses anything other than ETH_P_ALL, so yes, that should be default. With these modifications, the equivalent would look like bpf_tc_cls_attach(prog_fd, TC_DEV(ifindex, INGRESS), NULL, &id); Few things compared to bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): 1) nit, but why even 'cls' in the name. I think we shouldn't expose such old-days tc semantics to a user. Just bpf_tc_attach() is cleaner/simpler to understand. 2) What's the 'TC_DEV(ifindex, INGRESS)' macro doing exactly? Looks unnecessary, why not regular args to the API? 3) Exposed bpf_tc_attach() API could internally call a bpf_tc_attach_opts() API with preset defaults, and the latter could have all the custom bits if the user needs to go beyond the simple API, so from your bpf_tc_cls_attach() I'd also drop the NULL. 4) For the simple API I'd likely also drop the id (you could have a query API if needed). So as long as the user doesn't care about other details, they can just pass opts as NULL.
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
Daniel Borkmann writes: > On 3/30/21 10:39 PM, Andrii Nakryiko wrote: >> On Sun, Mar 28, 2021 at 1:11 AM Kumar Kartikeya Dwivedi >> wrote: >>> On Sun, Mar 28, 2021 at 10:12:40AM IST, Andrii Nakryiko wrote: Is there some succinct but complete enough documentation/tutorial/etc that I can reasonably read to understand kernel APIs provided by TC (w.r.t. BPF, of course). I'm trying to wrap my head around this and whether API makes sense or not. Please share links, if you have some. >>> >>> Hi Andrii, >>> >>> Unfortunately for the kernel API part, I couldn't find any when I was >>> working >>> on this. So I had to read the iproute2 tc code (tc_filter.c, f_bpf.c, >>> m_action.c, m_bpf.c) and the kernel side bits (cls_api.c, cls_bpf.c, >>> act_api.c, >>> act_bpf.c) to grok anything I didn't understand. There's also similar code >>> in >>> libnl (lib/route/{act,cls}.c). >>> >>> Other than that, these resources were useful (perhaps you already went >>> through >>> some/all of them): >>> >>> https://docs.cilium.io/en/latest/bpf/#tc-traffic-control >>> https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/ >>> tc(8), and tc-bpf(8) man pages >>> >>> I hope this is helpful! >> >> Thanks! I'll take a look. Sorry, I'm a bit behind with all the stuff, >> trying to catch up. >> >> I was just wondering if it would be more natural instead of having >> _dev _block variants and having to specify __u32 ifindex, __u32 >> parent_id, __u32 protocol, to have some struct specifying TC >> "destination"? Maybe not, but I thought I'd bring this up early. So >> you'd have just bpf_tc_cls_attach(), and you'd so something like >> >> bpf_tc_cls_attach(prog_fd, TC_DEV(ifindex, parent_id, protocol)) >> >> or >> >> bpf_tc_cls_attach(prog_fd, TC_BLOCK(block_idx, protocol)) >> >> ? Or it's taking it too far? >> >> But even if not, I think detaching can be unified between _dev and >> _block, can't it? > > Do we even need the _block variant? I would rather prefer to take the chance > and make it as simple as possible, and only iff really needed extend with > other APIs, for example: > >bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}); > > Internally, this will create the sch_clsact qdisc & cls_bpf filter instance > iff not present yet, and attach to a default prio 1 handle 1, and _always_ in > direct-action mode. This is /as simple as it gets/ and we don't need to bother > users with more complex tc/cls_bpf internals unless desired. For example, > extended APIs could add prio/parent so that multi-prog can be attached to a > single cls_bpf instance, but even that could be a second step, imho. While I'm all for simplifying where possible, the question becomes at what level? I.e., we initially figured we'd expose (most of) the netlink API in the low-level API (patch 3 in the series) and then have the bpf_program__* level API be the simple "just attach" one... We could simplify the low-level one further, of course, for instance by getting rid of the block stuff entirely, but I don't see much value in leaving out the support for prio/parent in the bpf_tc_cls_* - we'd have to make the API extensible so it could be added later anyway, so why not just include it from the get-go (especially as Kumar has already written the code?) -Toke
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Wed, Mar 31, 2021 at 02:55:47AM IST, Daniel Borkmann wrote: > Do we even need the _block variant? I would rather prefer to take the chance > and make it as simple as possible, and only iff really needed extend with > other APIs, for example: The block variant can be dropped, I'll use the TC_BLOCK/TC_DEV alternative which sets parent_id/ifindex properly. > > bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}); > > Internally, this will create the sch_clsact qdisc & cls_bpf filter instance > iff not present yet, and attach to a default prio 1 handle 1, and _always_ in > direct-action mode. This is /as simple as it gets/ and we don't need to bother > users with more complex tc/cls_bpf internals unless desired. For example, > extended APIs could add prio/parent so that multi-prog can be attached to a > single cls_bpf instance, but even that could be a second step, imho. > I am not opposed to clsact qdisc setup if INGRESS/EGRESS is supplied (not sure how others feel about it). We could make direct_action mode default, and similarly choose prio as 1 by default instead of letting the kernel do it. Then you can just pass in NULL for bpf_tc_cls_opts and be close to what you're proposing. For protocol we can choose ETH_P_ALL by default too if the user doesn't set it. With these modifications, the equivalent would look like bpf_tc_cls_attach(prog_fd, TC_DEV(ifindex, INGRESS), NULL, &id); So as long as the user doesn't care about other details, they can just pass opts as NULL. WDYT? > Thanks, > Daniel -- Kartikeya
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Wed, Mar 31, 2021 at 02:41:40AM IST, Toke Høiland-Jørgensen wrote: > Andrii Nakryiko writes: > > > On Sun, Mar 28, 2021 at 1:11 AM Kumar Kartikeya Dwivedi > > wrote: > >> > >> On Sun, Mar 28, 2021 at 10:12:40AM IST, Andrii Nakryiko wrote: > >> > Is there some succinct but complete enough documentation/tutorial/etc > >> > that I can reasonably read to understand kernel APIs provided by TC > >> > (w.r.t. BPF, of course). I'm trying to wrap my head around this and > >> > whether API makes sense or not. Please share links, if you have some. > >> > > >> > >> Hi Andrii, > >> > >> Unfortunately for the kernel API part, I couldn't find any when I was > >> working > >> on this. So I had to read the iproute2 tc code (tc_filter.c, f_bpf.c, > >> m_action.c, m_bpf.c) and the kernel side bits (cls_api.c, cls_bpf.c, > >> act_api.c, > >> act_bpf.c) to grok anything I didn't understand. There's also similar code > >> in > >> libnl (lib/route/{act,cls}.c). > >> > >> Other than that, these resources were useful (perhaps you already went > >> through > >> some/all of them): > >> > >> https://docs.cilium.io/en/latest/bpf/#tc-traffic-control > >> https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/ > >> tc(8), and tc-bpf(8) man pages > >> > >> I hope this is helpful! > > > > Thanks! I'll take a look. Sorry, I'm a bit behind with all the stuff, > > trying to catch up. > > > > I was just wondering if it would be more natural instead of having > > _dev _block variants and having to specify __u32 ifindex, __u32 > > parent_id, __u32 protocol, to have some struct specifying TC > > "destination"? Maybe not, but I thought I'd bring this up early. So > > you'd have just bpf_tc_cls_attach(), and you'd so something like > > > > bpf_tc_cls_attach(prog_fd, TC_DEV(ifindex, parent_id, protocol)) > > > > or > > > > bpf_tc_cls_attach(prog_fd, TC_BLOCK(block_idx, protocol)) > > > > ? Or it's taking it too far? > > Hmm, that's not a bad idea, actually. An earlier version of the series > did have only a single set of functions, but with way too many > arguments, which is why we ended up agreeing to split them. But > encapsulating the destination in a separate struct and combining it with > some helper macros might just make this work! I like it! Kumar, WDYT? > SGTM. > -Toke > -- Kartikeya
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Tue, Mar 30, 2021 at 2:26 PM Daniel Borkmann wrote: > > On 3/30/21 10:39 PM, Andrii Nakryiko wrote: > > On Sun, Mar 28, 2021 at 1:11 AM Kumar Kartikeya Dwivedi > > wrote: > >> On Sun, Mar 28, 2021 at 10:12:40AM IST, Andrii Nakryiko wrote: > >>> Is there some succinct but complete enough documentation/tutorial/etc > >>> that I can reasonably read to understand kernel APIs provided by TC > >>> (w.r.t. BPF, of course). I'm trying to wrap my head around this and > >>> whether API makes sense or not. Please share links, if you have some. > >> > >> Hi Andrii, > >> > >> Unfortunately for the kernel API part, I couldn't find any when I was > >> working > >> on this. So I had to read the iproute2 tc code (tc_filter.c, f_bpf.c, > >> m_action.c, m_bpf.c) and the kernel side bits (cls_api.c, cls_bpf.c, > >> act_api.c, > >> act_bpf.c) to grok anything I didn't understand. There's also similar code > >> in > >> libnl (lib/route/{act,cls}.c). > >> > >> Other than that, these resources were useful (perhaps you already went > >> through > >> some/all of them): > >> > >> https://docs.cilium.io/en/latest/bpf/#tc-traffic-control > >> https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/ > >> tc(8), and tc-bpf(8) man pages > >> > >> I hope this is helpful! > > > > Thanks! I'll take a look. Sorry, I'm a bit behind with all the stuff, > > trying to catch up. > > > > I was just wondering if it would be more natural instead of having > > _dev _block variants and having to specify __u32 ifindex, __u32 > > parent_id, __u32 protocol, to have some struct specifying TC > > "destination"? Maybe not, but I thought I'd bring this up early. So > > you'd have just bpf_tc_cls_attach(), and you'd so something like > > > > bpf_tc_cls_attach(prog_fd, TC_DEV(ifindex, parent_id, protocol)) > > > > or > > > > bpf_tc_cls_attach(prog_fd, TC_BLOCK(block_idx, protocol)) > > > > ? Or it's taking it too far? > > > > But even if not, I think detaching can be unified between _dev and > > _block, can't it? > > Do we even need the _block variant? I would rather prefer to take the chance > and make it as simple as possible, and only iff really needed extend with > other APIs, for example: > >bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}); > > Internally, this will create the sch_clsact qdisc & cls_bpf filter instance > iff not present yet, and attach to a default prio 1 handle 1, and _always_ in > direct-action mode. This is /as simple as it gets/ and we don't need to bother > users with more complex tc/cls_bpf internals unless desired. For example, > extended APIs could add prio/parent so that multi-prog can be attached to a > single cls_bpf instance, but even that could be a second step, imho. +1 to support sched_cls in direct-action mode only.
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On 3/30/21 10:39 PM, Andrii Nakryiko wrote: On Sun, Mar 28, 2021 at 1:11 AM Kumar Kartikeya Dwivedi wrote: On Sun, Mar 28, 2021 at 10:12:40AM IST, Andrii Nakryiko wrote: Is there some succinct but complete enough documentation/tutorial/etc that I can reasonably read to understand kernel APIs provided by TC (w.r.t. BPF, of course). I'm trying to wrap my head around this and whether API makes sense or not. Please share links, if you have some. Hi Andrii, Unfortunately for the kernel API part, I couldn't find any when I was working on this. So I had to read the iproute2 tc code (tc_filter.c, f_bpf.c, m_action.c, m_bpf.c) and the kernel side bits (cls_api.c, cls_bpf.c, act_api.c, act_bpf.c) to grok anything I didn't understand. There's also similar code in libnl (lib/route/{act,cls}.c). Other than that, these resources were useful (perhaps you already went through some/all of them): https://docs.cilium.io/en/latest/bpf/#tc-traffic-control https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/ tc(8), and tc-bpf(8) man pages I hope this is helpful! Thanks! I'll take a look. Sorry, I'm a bit behind with all the stuff, trying to catch up. I was just wondering if it would be more natural instead of having _dev _block variants and having to specify __u32 ifindex, __u32 parent_id, __u32 protocol, to have some struct specifying TC "destination"? Maybe not, but I thought I'd bring this up early. So you'd have just bpf_tc_cls_attach(), and you'd so something like bpf_tc_cls_attach(prog_fd, TC_DEV(ifindex, parent_id, protocol)) or bpf_tc_cls_attach(prog_fd, TC_BLOCK(block_idx, protocol)) ? Or it's taking it too far? But even if not, I think detaching can be unified between _dev and _block, can't it? Do we even need the _block variant? I would rather prefer to take the chance and make it as simple as possible, and only iff really needed extend with other APIs, for example: bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}); Internally, this will create the sch_clsact qdisc & cls_bpf filter instance iff not present yet, and attach to a default prio 1 handle 1, and _always_ in direct-action mode. This is /as simple as it gets/ and we don't need to bother users with more complex tc/cls_bpf internals unless desired. For example, extended APIs could add prio/parent so that multi-prog can be attached to a single cls_bpf instance, but even that could be a second step, imho. Thanks, Daniel
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
Andrii Nakryiko writes: > On Sun, Mar 28, 2021 at 1:11 AM Kumar Kartikeya Dwivedi > wrote: >> >> On Sun, Mar 28, 2021 at 10:12:40AM IST, Andrii Nakryiko wrote: >> > Is there some succinct but complete enough documentation/tutorial/etc >> > that I can reasonably read to understand kernel APIs provided by TC >> > (w.r.t. BPF, of course). I'm trying to wrap my head around this and >> > whether API makes sense or not. Please share links, if you have some. >> > >> >> Hi Andrii, >> >> Unfortunately for the kernel API part, I couldn't find any when I was working >> on this. So I had to read the iproute2 tc code (tc_filter.c, f_bpf.c, >> m_action.c, m_bpf.c) and the kernel side bits (cls_api.c, cls_bpf.c, >> act_api.c, >> act_bpf.c) to grok anything I didn't understand. There's also similar code in >> libnl (lib/route/{act,cls}.c). >> >> Other than that, these resources were useful (perhaps you already went >> through >> some/all of them): >> >> https://docs.cilium.io/en/latest/bpf/#tc-traffic-control >> https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/ >> tc(8), and tc-bpf(8) man pages >> >> I hope this is helpful! > > Thanks! I'll take a look. Sorry, I'm a bit behind with all the stuff, > trying to catch up. > > I was just wondering if it would be more natural instead of having > _dev _block variants and having to specify __u32 ifindex, __u32 > parent_id, __u32 protocol, to have some struct specifying TC > "destination"? Maybe not, but I thought I'd bring this up early. So > you'd have just bpf_tc_cls_attach(), and you'd so something like > > bpf_tc_cls_attach(prog_fd, TC_DEV(ifindex, parent_id, protocol)) > > or > > bpf_tc_cls_attach(prog_fd, TC_BLOCK(block_idx, protocol)) > > ? Or it's taking it too far? Hmm, that's not a bad idea, actually. An earlier version of the series did have only a single set of functions, but with way too many arguments, which is why we ended up agreeing to split them. But encapsulating the destination in a separate struct and combining it with some helper macros might just make this work! I like it! Kumar, WDYT? -Toke
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Sun, Mar 28, 2021 at 1:11 AM Kumar Kartikeya Dwivedi wrote: > > On Sun, Mar 28, 2021 at 10:12:40AM IST, Andrii Nakryiko wrote: > > Is there some succinct but complete enough documentation/tutorial/etc > > that I can reasonably read to understand kernel APIs provided by TC > > (w.r.t. BPF, of course). I'm trying to wrap my head around this and > > whether API makes sense or not. Please share links, if you have some. > > > > Hi Andrii, > > Unfortunately for the kernel API part, I couldn't find any when I was working > on this. So I had to read the iproute2 tc code (tc_filter.c, f_bpf.c, > m_action.c, m_bpf.c) and the kernel side bits (cls_api.c, cls_bpf.c, > act_api.c, > act_bpf.c) to grok anything I didn't understand. There's also similar code in > libnl (lib/route/{act,cls}.c). > > Other than that, these resources were useful (perhaps you already went through > some/all of them): > > https://docs.cilium.io/en/latest/bpf/#tc-traffic-control > https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/ > tc(8), and tc-bpf(8) man pages > > I hope this is helpful! Thanks! I'll take a look. Sorry, I'm a bit behind with all the stuff, trying to catch up. I was just wondering if it would be more natural instead of having _dev _block variants and having to specify __u32 ifindex, __u32 parent_id, __u32 protocol, to have some struct specifying TC "destination"? Maybe not, but I thought I'd bring this up early. So you'd have just bpf_tc_cls_attach(), and you'd so something like bpf_tc_cls_attach(prog_fd, TC_DEV(ifindex, parent_id, protocol)) or bpf_tc_cls_attach(prog_fd, TC_BLOCK(block_idx, protocol)) ? Or it's taking it too far? But even if not, I think detaching can be unified between _dev and _block, can't it? > > -- > Kartikeya
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Mon 29 Mar 2021 at 15:32, Toke Høiland-Jørgensen wrote: > Vlad Buslov writes: > >> On Thu 25 Mar 2021 at 14:00, Kumar Kartikeya Dwivedi >> wrote: >>> This adds functions that wrap the netlink API used for adding, >>> manipulating, and removing filters and actions. These functions operate >>> directly on the loaded prog's fd, and return a handle to the filter and >>> action using an out parameter (id for tc_cls, and index for tc_act). >>> >>> The basic featureset is covered to allow for attaching, manipulation of >>> properties, and removal of filters and actions. Some additional features >>> like TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can >>> added on top later by extending the bpf_tc_cls_opts struct. >>> >>> Support for binding actions directly to a classifier by passing them in >>> during filter creation has also been omitted for now. These actions >>> have an auto clean up property because their lifetime is bound to the >>> filter they are attached to. This can be added later, but was omitted >>> for now as direct action mode is a better alternative to it. >>> >>> An API summary: >>> >>> The BPF TC-CLS API >>> >>> bpf_tc_act_{attach, change, replace}_{dev, block} may be used to attach, >>> change, and replace SCHED_CLS bpf classifiers. Separate set of functions >>> are provided for network interfaces and shared filter blocks. >>> >>> bpf_tc_cls_detach_{dev, block} may be used to detach existing SCHED_CLS >>> filter. The bpf_tc_cls_attach_id object filled in during attach, >>> change, or replace must be passed in to the detach functions for them to >>> remove the filter and its attached classififer correctly. >>> >>> bpf_tc_cls_get_info is a helper that can be used to obtain attributes >>> for the filter and classififer. The opts structure may be used to >>> choose the granularity of search, such that info for a specific filter >>> corresponding to the same loaded bpf program can be obtained. By >>> default, the first match is returned to the user. >>> >>> Examples: >>> >>> struct bpf_tc_cls_attach_id id = {}; >>> struct bpf_object *obj; >>> struct bpf_program *p; >>> int fd, r; >>> >>> obj = bpf_object_open("foo.o"); >>> if (IS_ERR_OR_NULL(obj)) >>> return PTR_ERR(obj); >>> >>> p = bpf_object__find_program_by_title(obj, "classifier"); >>> if (IS_ERR_OR_NULL(p)) >>> return PTR_ERR(p); >>> >>> if (bpf_object__load(obj) < 0) >>> return -1; >>> >>> fd = bpf_program__fd(p); >>> >>> r = bpf_tc_cls_attach_dev(fd, if_nametoindex("lo"), >>> BPF_TC_CLSACT_INGRESS, ETH_P_IP, >>> NULL, &id); >>> if (r < 0) >>> return r; >>> >>> ... which is roughly equivalent to (after clsact qdisc setup): >>> # tc filter add dev lo ingress bpf obj /home/kkd/foo.o sec classifier >>> >>> If a user wishes to modify existing options on an attached filter, the >>> bpf_tc_cls_change_{dev, block} API may be used. Parameters like >>> chain_index, priority, and handle are ignored in the bpf_tc_cls_opts >>> struct as they cannot be modified after attaching a filter. >>> >>> Example: >>> >>> /* Optional parameters necessary to select the right filter */ >>> DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, >>> .handle = id.handle, >>> .priority = id.priority, >>> .chain_index = id.chain_index) >>> /* Turn on direct action mode */ >>> opts.direct_action = true; >>> r = bpf_tc_cls_change_dev(fd, id.ifindex, id.parent_id, >>> id.protocol, &opts, &id); >>> if (r < 0) >>> return r; >>> >>> /* Verify that the direct action mode has been set */ >>> struct bpf_tc_cls_info info = {}; >>> r = bpf_tc_cls_get_info_dev(fd, id.ifindex, id.parent_id, >>> id.protocol, &opts, &info); >>> if (r < 0) >>> return r; >>> >>> assert(info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT); >>> >>> This would be roughly equivalent to doing: >>> # tc filter change dev lo egress prio handle bpf obj >>> /home/kkd/foo.o section classifier da >>> >>> ... except a new bpf program will be loaded and replace existing one. >>> >>> If a user wishes to either replace an existing filter, or create a new >>> one with the same properties, they can use bpf_tc_cls_replace_dev. The >>> benefit of bpf_tc_cls_change is that it fails if no matching filter >>> exists. >>> >>> The BPF TC-ACT API >>> >>> bpf_tc_act_{attach, replace} may be used to attach and replace already >>> attached SCHED_ACT actions. Passing an index of 0 has special meaning, >>> in that an index will be automatically chosen by the kernel. The index >>> chosen by the kernel is the return value of these functions in case of >>> success. >>> >>> bpf_tc_act_detach may be used to detach a SCHED_ACT action prog >>> identified by the index parameter. T
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
Vlad Buslov writes: > On Thu 25 Mar 2021 at 14:00, Kumar Kartikeya Dwivedi wrote: >> This adds functions that wrap the netlink API used for adding, >> manipulating, and removing filters and actions. These functions operate >> directly on the loaded prog's fd, and return a handle to the filter and >> action using an out parameter (id for tc_cls, and index for tc_act). >> >> The basic featureset is covered to allow for attaching, manipulation of >> properties, and removal of filters and actions. Some additional features >> like TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can >> added on top later by extending the bpf_tc_cls_opts struct. >> >> Support for binding actions directly to a classifier by passing them in >> during filter creation has also been omitted for now. These actions >> have an auto clean up property because their lifetime is bound to the >> filter they are attached to. This can be added later, but was omitted >> for now as direct action mode is a better alternative to it. >> >> An API summary: >> >> The BPF TC-CLS API >> >> bpf_tc_act_{attach, change, replace}_{dev, block} may be used to attach, >> change, and replace SCHED_CLS bpf classifiers. Separate set of functions >> are provided for network interfaces and shared filter blocks. >> >> bpf_tc_cls_detach_{dev, block} may be used to detach existing SCHED_CLS >> filter. The bpf_tc_cls_attach_id object filled in during attach, >> change, or replace must be passed in to the detach functions for them to >> remove the filter and its attached classififer correctly. >> >> bpf_tc_cls_get_info is a helper that can be used to obtain attributes >> for the filter and classififer. The opts structure may be used to >> choose the granularity of search, such that info for a specific filter >> corresponding to the same loaded bpf program can be obtained. By >> default, the first match is returned to the user. >> >> Examples: >> >> struct bpf_tc_cls_attach_id id = {}; >> struct bpf_object *obj; >> struct bpf_program *p; >> int fd, r; >> >> obj = bpf_object_open("foo.o"); >> if (IS_ERR_OR_NULL(obj)) >> return PTR_ERR(obj); >> >> p = bpf_object__find_program_by_title(obj, "classifier"); >> if (IS_ERR_OR_NULL(p)) >> return PTR_ERR(p); >> >> if (bpf_object__load(obj) < 0) >> return -1; >> >> fd = bpf_program__fd(p); >> >> r = bpf_tc_cls_attach_dev(fd, if_nametoindex("lo"), >>BPF_TC_CLSACT_INGRESS, ETH_P_IP, >>NULL, &id); >> if (r < 0) >> return r; >> >> ... which is roughly equivalent to (after clsact qdisc setup): >> # tc filter add dev lo ingress bpf obj /home/kkd/foo.o sec classifier >> >> If a user wishes to modify existing options on an attached filter, the >> bpf_tc_cls_change_{dev, block} API may be used. Parameters like >> chain_index, priority, and handle are ignored in the bpf_tc_cls_opts >> struct as they cannot be modified after attaching a filter. >> >> Example: >> >> /* Optional parameters necessary to select the right filter */ >> DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, >> .handle = id.handle, >> .priority = id.priority, >> .chain_index = id.chain_index) >> /* Turn on direct action mode */ >> opts.direct_action = true; >> r = bpf_tc_cls_change_dev(fd, id.ifindex, id.parent_id, >>id.protocol, &opts, &id); >> if (r < 0) >> return r; >> >> /* Verify that the direct action mode has been set */ >> struct bpf_tc_cls_info info = {}; >> r = bpf_tc_cls_get_info_dev(fd, id.ifindex, id.parent_id, >> id.protocol, &opts, &info); >> if (r < 0) >> return r; >> >> assert(info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT); >> >> This would be roughly equivalent to doing: >> # tc filter change dev lo egress prio handle bpf obj >> /home/kkd/foo.o section classifier da >> >> ... except a new bpf program will be loaded and replace existing one. >> >> If a user wishes to either replace an existing filter, or create a new >> one with the same properties, they can use bpf_tc_cls_replace_dev. The >> benefit of bpf_tc_cls_change is that it fails if no matching filter >> exists. >> >> The BPF TC-ACT API >> >> bpf_tc_act_{attach, replace} may be used to attach and replace already >> attached SCHED_ACT actions. Passing an index of 0 has special meaning, >> in that an index will be automatically chosen by the kernel. The index >> chosen by the kernel is the return value of these functions in case of >> success. >> >> bpf_tc_act_detach may be used to detach a SCHED_ACT action prog >> identified by the index parameter. The index 0 again has a special >> meaning, in that passing it will flush all existing SCHED_ACT actions >> loaded using the ACT API. >> >> bpf_tc_a
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Thu 25 Mar 2021 at 14:00, Kumar Kartikeya Dwivedi wrote: > This adds functions that wrap the netlink API used for adding, > manipulating, and removing filters and actions. These functions operate > directly on the loaded prog's fd, and return a handle to the filter and > action using an out parameter (id for tc_cls, and index for tc_act). > > The basic featureset is covered to allow for attaching, manipulation of > properties, and removal of filters and actions. Some additional features > like TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can > added on top later by extending the bpf_tc_cls_opts struct. > > Support for binding actions directly to a classifier by passing them in > during filter creation has also been omitted for now. These actions > have an auto clean up property because their lifetime is bound to the > filter they are attached to. This can be added later, but was omitted > for now as direct action mode is a better alternative to it. > > An API summary: > > The BPF TC-CLS API > > bpf_tc_act_{attach, change, replace}_{dev, block} may be used to attach, > change, and replace SCHED_CLS bpf classifiers. Separate set of functions > are provided for network interfaces and shared filter blocks. > > bpf_tc_cls_detach_{dev, block} may be used to detach existing SCHED_CLS > filter. The bpf_tc_cls_attach_id object filled in during attach, > change, or replace must be passed in to the detach functions for them to > remove the filter and its attached classififer correctly. > > bpf_tc_cls_get_info is a helper that can be used to obtain attributes > for the filter and classififer. The opts structure may be used to > choose the granularity of search, such that info for a specific filter > corresponding to the same loaded bpf program can be obtained. By > default, the first match is returned to the user. > > Examples: > > struct bpf_tc_cls_attach_id id = {}; > struct bpf_object *obj; > struct bpf_program *p; > int fd, r; > > obj = bpf_object_open("foo.o"); > if (IS_ERR_OR_NULL(obj)) > return PTR_ERR(obj); > > p = bpf_object__find_program_by_title(obj, "classifier"); > if (IS_ERR_OR_NULL(p)) > return PTR_ERR(p); > > if (bpf_object__load(obj) < 0) > return -1; > > fd = bpf_program__fd(p); > > r = bpf_tc_cls_attach_dev(fd, if_nametoindex("lo"), > BPF_TC_CLSACT_INGRESS, ETH_P_IP, > NULL, &id); > if (r < 0) > return r; > > ... which is roughly equivalent to (after clsact qdisc setup): > # tc filter add dev lo ingress bpf obj /home/kkd/foo.o sec classifier > > If a user wishes to modify existing options on an attached filter, the > bpf_tc_cls_change_{dev, block} API may be used. Parameters like > chain_index, priority, and handle are ignored in the bpf_tc_cls_opts > struct as they cannot be modified after attaching a filter. > > Example: > > /* Optional parameters necessary to select the right filter */ > DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, > .handle = id.handle, > .priority = id.priority, > .chain_index = id.chain_index) > /* Turn on direct action mode */ > opts.direct_action = true; > r = bpf_tc_cls_change_dev(fd, id.ifindex, id.parent_id, > id.protocol, &opts, &id); > if (r < 0) > return r; > > /* Verify that the direct action mode has been set */ > struct bpf_tc_cls_info info = {}; > r = bpf_tc_cls_get_info_dev(fd, id.ifindex, id.parent_id, > id.protocol, &opts, &info); > if (r < 0) > return r; > > assert(info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT); > > This would be roughly equivalent to doing: > # tc filter change dev lo egress prio handle bpf obj > /home/kkd/foo.o section classifier da > > ... except a new bpf program will be loaded and replace existing one. > > If a user wishes to either replace an existing filter, or create a new > one with the same properties, they can use bpf_tc_cls_replace_dev. The > benefit of bpf_tc_cls_change is that it fails if no matching filter > exists. > > The BPF TC-ACT API > > bpf_tc_act_{attach, replace} may be used to attach and replace already > attached SCHED_ACT actions. Passing an index of 0 has special meaning, > in that an index will be automatically chosen by the kernel. The index > chosen by the kernel is the return value of these functions in case of > success. > > bpf_tc_act_detach may be used to detach a SCHED_ACT action prog > identified by the index parameter. The index 0 again has a special > meaning, in that passing it will flush all existing SCHED_ACT actions > loaded using the ACT API. > > bpf_tc_act_get_info is a helper to get the required attributes of a > loaded program to be able to manipulate i
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Sun, Mar 28, 2021 at 10:12:40AM IST, Andrii Nakryiko wrote: > Is there some succinct but complete enough documentation/tutorial/etc > that I can reasonably read to understand kernel APIs provided by TC > (w.r.t. BPF, of course). I'm trying to wrap my head around this and > whether API makes sense or not. Please share links, if you have some. > Hi Andrii, Unfortunately for the kernel API part, I couldn't find any when I was working on this. So I had to read the iproute2 tc code (tc_filter.c, f_bpf.c, m_action.c, m_bpf.c) and the kernel side bits (cls_api.c, cls_bpf.c, act_api.c, act_bpf.c) to grok anything I didn't understand. There's also similar code in libnl (lib/route/{act,cls}.c). Other than that, these resources were useful (perhaps you already went through some/all of them): https://docs.cilium.io/en/latest/bpf/#tc-traffic-control https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/ tc(8), and tc-bpf(8) man pages I hope this is helpful! -- Kartikeya
Re: [PATCH bpf-next 3/5] libbpf: add low level TC-BPF API
On Thu, Mar 25, 2021 at 5:02 AM Kumar Kartikeya Dwivedi wrote: > > This adds functions that wrap the netlink API used for adding, > manipulating, and removing filters and actions. These functions operate > directly on the loaded prog's fd, and return a handle to the filter and > action using an out parameter (id for tc_cls, and index for tc_act). > > The basic featureset is covered to allow for attaching, manipulation of > properties, and removal of filters and actions. Some additional features > like TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can > added on top later by extending the bpf_tc_cls_opts struct. > > Support for binding actions directly to a classifier by passing them in > during filter creation has also been omitted for now. These actions > have an auto clean up property because their lifetime is bound to the > filter they are attached to. This can be added later, but was omitted > for now as direct action mode is a better alternative to it. > > An API summary: > > The BPF TC-CLS API > > bpf_tc_act_{attach, change, replace}_{dev, block} may be used to attach, > change, and replace SCHED_CLS bpf classifiers. Separate set of functions > are provided for network interfaces and shared filter blocks. > > bpf_tc_cls_detach_{dev, block} may be used to detach existing SCHED_CLS > filter. The bpf_tc_cls_attach_id object filled in during attach, > change, or replace must be passed in to the detach functions for them to > remove the filter and its attached classififer correctly. > > bpf_tc_cls_get_info is a helper that can be used to obtain attributes > for the filter and classififer. The opts structure may be used to > choose the granularity of search, such that info for a specific filter > corresponding to the same loaded bpf program can be obtained. By > default, the first match is returned to the user. > > Examples: > > struct bpf_tc_cls_attach_id id = {}; > struct bpf_object *obj; > struct bpf_program *p; > int fd, r; > > obj = bpf_object_open("foo.o"); > if (IS_ERR_OR_NULL(obj)) > return PTR_ERR(obj); > > p = bpf_object__find_program_by_title(obj, "classifier"); > if (IS_ERR_OR_NULL(p)) > return PTR_ERR(p); > > if (bpf_object__load(obj) < 0) > return -1; > > fd = bpf_program__fd(p); > > r = bpf_tc_cls_attach_dev(fd, if_nametoindex("lo"), > BPF_TC_CLSACT_INGRESS, ETH_P_IP, > NULL, &id); > if (r < 0) > return r; > > ... which is roughly equivalent to (after clsact qdisc setup): > # tc filter add dev lo ingress bpf obj /home/kkd/foo.o sec classifier > > If a user wishes to modify existing options on an attached filter, the > bpf_tc_cls_change_{dev, block} API may be used. Parameters like > chain_index, priority, and handle are ignored in the bpf_tc_cls_opts > struct as they cannot be modified after attaching a filter. > > Example: > > /* Optional parameters necessary to select the right filter */ > DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, > .handle = id.handle, > .priority = id.priority, > .chain_index = id.chain_index) > /* Turn on direct action mode */ > opts.direct_action = true; > r = bpf_tc_cls_change_dev(fd, id.ifindex, id.parent_id, > id.protocol, &opts, &id); > if (r < 0) > return r; > > /* Verify that the direct action mode has been set */ > struct bpf_tc_cls_info info = {}; > r = bpf_tc_cls_get_info_dev(fd, id.ifindex, id.parent_id, > id.protocol, &opts, &info); > if (r < 0) > return r; > > assert(info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT); > > This would be roughly equivalent to doing: > # tc filter change dev lo egress prio handle bpf obj > /home/kkd/foo.o section classifier da > > ... except a new bpf program will be loaded and replace existing one. > > If a user wishes to either replace an existing filter, or create a new > one with the same properties, they can use bpf_tc_cls_replace_dev. The > benefit of bpf_tc_cls_change is that it fails if no matching filter > exists. > > The BPF TC-ACT API Is there some succinct but complete enough documentation/tutorial/etc that I can reasonably read to understand kernel APIs provided by TC (w.r.t. BPF, of course). I'm trying to wrap my head around this and whether API makes sense or not. Please share links, if you have some. > > bpf_tc_act_{attach, replace} may be used to attach and replace already > attached SCHED_ACT actions. Passing an index of 0 has special meaning, > in that an index will be automatically chosen by the kernel. The index > chosen by the kernel is the return value of these functions in case of > success.