Re: [PATCH bpf-next] bpf: libbpf: retry program creation without the name
On Fri, Nov 23, 2018 at 2:51 AM Quentin Monnet wrote: > > 2018-11-21 09:28 UTC-0800 ~ Stanislav Fomichev > > On 11/21, Quentin Monnet wrote: > >> 2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev > >>> On 11/20, Alexei Starovoitov wrote: > On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote: > > On 11/21/2018 12:04 AM, Alexei Starovoitov wrote: > >> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote: > >>> On 11/20, Alexei Starovoitov wrote: > On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote: > > [Recent commit 23499442c319 ("bpf: libbpf: retry map creation > > without > > the name") fixed this issue for maps, let's do the same for > > programs.] > > > > Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support > > to specify BPF obj name"), libbpf unconditionally sets > > bpf_attr->name > > for programs. Pre v4.14 kernels don't know about programs names and > > return an error about unexpected non-zero data. Retry sys_bpf > > without > > a program name to cover older kernels. > > > > Signed-off-by: Stanislav Fomichev > > --- > > tools/lib/bpf/bpf.c | 10 ++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c > > index 961e1b9fc592..cbe9d757c646 100644 > > --- a/tools/lib/bpf/bpf.c > > +++ b/tools/lib/bpf/bpf.c > > @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct > > bpf_load_program_attr *load_attr, > > if (fd >= 0 || !log_buf || !log_buf_sz) > > return fd; > > > > + if (fd < 0 && errno == E2BIG && load_attr->name) { > > + /* Retry the same syscall, but without the name. > > + * Pre v4.14 kernels don't support prog names. > > + */ > > I'm afraid that will put unnecessary stress on the kernel. > This check needs to be tighter. > Like E2BIG and anything in the log_buf probably means that > E2BIG came from the verifier and nothing to do with prog_name. > Asking kernel to repeat is an unnecessary work. > > In general we need to think beyond this single prog_name field. > There are bunch of other fields in bpf_load_program_xattr() and > older kernels > won't support them. Are we going to zero them out one by one > and retry? I don't think that would be practical. > >>> I general, we don't want to zero anything out. However, > >>> for this particular problem the rationale is the following: > >>> In commit 88cda1c9da02 we started unconditionally setting > >>> {prog,map}->name > >>> from the 'higher' libbpfc layer which breaks users on the older > >>> kernels. > >>> > Also libbpf silently ignoring prog_name is not great for debugging. > A warning is needed. > But it cannot be done out of lib/bpf/bpf.c, since it's a set of > syscall > wrappers. > Imo such "old kernel -> lets retry" feature should probably be done > at lib/bpf/libbpf.c level. inside load_program(). > >>> For maps bpftools calls bpf_create_map_xattr directly, that's why > >>> for maps I did the retry on the lower level (and why for programs I > >>> initially > >>> thought about doing the same). However, in this case maybe asking > >>> user to omit 'name' argument might be a better option. > >>> > >>> For program names, I agree, we might think about doing it on the > >>> higher > >>> level (although I'm not sure whether we want to have different API > >>> expectations, i.e. bpf_create_map_xattr ignoring the name and > >>> bpf_load_program_xattr not ignoring the name). > >>> > >>> So given that rationale above, what do you think is the best way to > >>> move forward? > >>> 1. Same patch, but tighten the retry check inside > >>> bpf_load_program_xattr ? > >>> 2. Move this retry logic into load_program and have different handling > >>> for bpf_create_map_xattr vs bpf_load_program_xattr ? > >>> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr > >>> into bpf_object__create_maps ? > >>> > >>> (I'm slightly leaning towards #3) > >> > >> me too. I think it's cleaner for maps to do it in > >> bpf_object__create_maps(). > >> Originally bpf.c was envisioned to be a thin layer on top of bpf > >> syscall. > >> Whereas 'smart bits' would go into libbpf.c > > > > Can't we create in bpf_object__load() a small helper > > bpf_object__probe_caps() > > which would figure this out _once_ upon start with a few things to > > probe for > > availabili
[PATCH v2 bpf-next] bpf: add skb->tstamp r/w access from tc clsact and cg skb progs
This could be used to rate limit egress traffic in concert with a qdisc which supports Earliest Departure Time, such as FQ. Write access from cg skb progs only with CAP_SYS_ADMIN, since the value will be used by downstream qdiscs. It might make sense to relax this. Changes v1 -> v2: - allow access from cg skb, write only with CAP_SYS_ADMIN Signed-off-by: Vlad Dumitrescu --- include/uapi/linux/bpf.h| 1 + net/core/filter.c | 29 + tools/include/uapi/linux/bpf.h | 1 + tools/testing/selftests/bpf/test_verifier.c | 29 + 4 files changed, 60 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c1554aa074659..23e2031a43d43 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2468,6 +2468,7 @@ struct __sk_buff { __u32 data_meta; struct bpf_flow_keys *flow_keys; + __u64 tstamp; }; struct bpf_tunnel_key { diff --git a/net/core/filter.c b/net/core/filter.c index f6ca38a7d4332..65dc13aeca7c4 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5573,6 +5573,10 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type if (size != sizeof(struct bpf_flow_keys *)) return false; break; + case bpf_ctx_range(struct __sk_buff, tstamp): + if (size != sizeof(__u64)) + return false; + break; default: /* Only narrow read access allowed for now. */ if (type == BPF_WRITE) { @@ -5600,6 +5604,7 @@ static bool sk_filter_is_valid_access(int off, int size, case bpf_ctx_range(struct __sk_buff, data_end): case bpf_ctx_range(struct __sk_buff, flow_keys): case bpf_ctx_range_till(struct __sk_buff, family, local_port): + case bpf_ctx_range(struct __sk_buff, tstamp): return false; } @@ -5638,6 +5643,10 @@ static bool cg_skb_is_valid_access(int off, int size, case bpf_ctx_range(struct __sk_buff, priority): case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]): break; + case bpf_ctx_range(struct __sk_buff, tstamp): + if (!capable(CAP_SYS_ADMIN)) + return false; + break; default: return false; } @@ -5665,6 +5674,7 @@ static bool lwt_is_valid_access(int off, int size, case bpf_ctx_range_till(struct __sk_buff, family, local_port): case bpf_ctx_range(struct __sk_buff, data_meta): case bpf_ctx_range(struct __sk_buff, flow_keys): + case bpf_ctx_range(struct __sk_buff, tstamp): return false; } @@ -5874,6 +5884,7 @@ static bool tc_cls_act_is_valid_access(int off, int size, case bpf_ctx_range(struct __sk_buff, priority): case bpf_ctx_range(struct __sk_buff, tc_classid): case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]): + case bpf_ctx_range(struct __sk_buff, tstamp): break; default: return false; @@ -6093,6 +6104,7 @@ static bool sk_skb_is_valid_access(int off, int size, case bpf_ctx_range(struct __sk_buff, tc_classid): case bpf_ctx_range(struct __sk_buff, data_meta): case bpf_ctx_range(struct __sk_buff, flow_keys): + case bpf_ctx_range(struct __sk_buff, tstamp): return false; } @@ -6179,6 +6191,7 @@ static bool flow_dissector_is_valid_access(int off, int size, case bpf_ctx_range(struct __sk_buff, tc_classid): case bpf_ctx_range(struct __sk_buff, data_meta): case bpf_ctx_range_till(struct __sk_buff, family, local_port): + case bpf_ctx_range(struct __sk_buff, tstamp): return false; } @@ -6488,6 +6501,22 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, *insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg, off); break; + + case offsetof(struct __sk_buff, tstamp): + BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, tstamp) != 8); + + if (type == BPF_WRITE) + *insn++ = BPF_STX_MEM(BPF_DW, + si->dst_reg, si->src_reg, + bpf_target_off(struct sk_buff, +tstamp, 8, +target_size)); + else + *insn++ = BPF_LDX_MEM(BPF_DW, + si->dst_reg, si->src_reg, +
Re: [PATCH bpf-next] bpf: add read/write access to skb->tstamp from tc clsact progs
On Wed, Nov 21, 2018 at 5:08 AM Eric Dumazet wrote: > > > > On 11/20/2018 06:40 PM, Alexei Starovoitov wrote: > > > > > looks good to me. > > > > Any particular reason you decided to disable it for cg_skb ? > > It seems to me the same EDT approach will work from > > cgroup-bpf skb hooks just as well and then we can have neat > > way of controlling traffic per-container instead of tc-clsbpf global. > > If you're already on cgroup v2 it will save you a lot of classifier > > cycles, since you'd be able to group apps by cgroup > > instead of relying on ip only. > > Vlad first wrote a complete version, but we felt explaining the _why_ > was probably harder. > > No particular reason, other than having to write more tests perhaps. This sounds reasonable to me. I can prepare a v2. Any concerns regarding capabilities? For example data and data_end are only available to CAP_SYS_ADMIN. Note that enforcement of this would be done by a global component later in the pipeline (e.g., FQ qdisc). Any opinions on sk_filter, lwt, and sk_skb before I send v2?
[PATCH bpf-next] bpf: add read/write access to skb->tstamp from tc clsact progs
This could be used to rate limit egress traffic in concert with a qdisc which supports Earliest Departure Time, such as FQ. Signed-off-by: Vlad Dumitrescu --- include/uapi/linux/bpf.h| 1 + net/core/filter.c | 26 + tools/include/uapi/linux/bpf.h | 1 + tools/testing/selftests/bpf/test_verifier.c | 4 4 files changed, 32 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c1554aa074659..23e2031a43d43 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2468,6 +2468,7 @@ struct __sk_buff { __u32 data_meta; struct bpf_flow_keys *flow_keys; + __u64 tstamp; }; struct bpf_tunnel_key { diff --git a/net/core/filter.c b/net/core/filter.c index f6ca38a7d4332..c45155c8e519c 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5573,6 +5573,10 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type if (size != sizeof(struct bpf_flow_keys *)) return false; break; + case bpf_ctx_range(struct __sk_buff, tstamp): + if (size != sizeof(__u64)) + return false; + break; default: /* Only narrow read access allowed for now. */ if (type == BPF_WRITE) { @@ -5600,6 +5604,7 @@ static bool sk_filter_is_valid_access(int off, int size, case bpf_ctx_range(struct __sk_buff, data_end): case bpf_ctx_range(struct __sk_buff, flow_keys): case bpf_ctx_range_till(struct __sk_buff, family, local_port): + case bpf_ctx_range(struct __sk_buff, tstamp): return false; } @@ -5624,6 +5629,7 @@ static bool cg_skb_is_valid_access(int off, int size, case bpf_ctx_range(struct __sk_buff, tc_classid): case bpf_ctx_range(struct __sk_buff, data_meta): case bpf_ctx_range(struct __sk_buff, flow_keys): + case bpf_ctx_range(struct __sk_buff, tstamp): return false; case bpf_ctx_range(struct __sk_buff, data): case bpf_ctx_range(struct __sk_buff, data_end): @@ -5665,6 +5671,7 @@ static bool lwt_is_valid_access(int off, int size, case bpf_ctx_range_till(struct __sk_buff, family, local_port): case bpf_ctx_range(struct __sk_buff, data_meta): case bpf_ctx_range(struct __sk_buff, flow_keys): + case bpf_ctx_range(struct __sk_buff, tstamp): return false; } @@ -5874,6 +5881,7 @@ static bool tc_cls_act_is_valid_access(int off, int size, case bpf_ctx_range(struct __sk_buff, priority): case bpf_ctx_range(struct __sk_buff, tc_classid): case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]): + case bpf_ctx_range(struct __sk_buff, tstamp): break; default: return false; @@ -6093,6 +6101,7 @@ static bool sk_skb_is_valid_access(int off, int size, case bpf_ctx_range(struct __sk_buff, tc_classid): case bpf_ctx_range(struct __sk_buff, data_meta): case bpf_ctx_range(struct __sk_buff, flow_keys): + case bpf_ctx_range(struct __sk_buff, tstamp): return false; } @@ -6179,6 +6188,7 @@ static bool flow_dissector_is_valid_access(int off, int size, case bpf_ctx_range(struct __sk_buff, tc_classid): case bpf_ctx_range(struct __sk_buff, data_meta): case bpf_ctx_range_till(struct __sk_buff, family, local_port): + case bpf_ctx_range(struct __sk_buff, tstamp): return false; } @@ -6488,6 +6498,22 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, *insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg, off); break; + + case offsetof(struct __sk_buff, tstamp): + BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, tstamp) != 8); + + if (type == BPF_WRITE) + *insn++ = BPF_STX_MEM(BPF_DW, + si->dst_reg, si->src_reg, + bpf_target_off(struct sk_buff, +tstamp, 8, +target_size)); + else + *insn++ = BPF_LDX_MEM(BPF_DW, + si->dst_reg, si->src_reg, + bpf_target_off(struct sk_buff, +tstamp, 8, +target_size)); } return insn - insn_buf; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index c1554aa074659..23
Re: [PATCH iproute2] lib/libnetlink: fix response seq check
Hi, On Fri, Sep 28, 2018 at 10:14 AM wrote: > > From: Vlad Dumitrescu > > Taking a one-iovec example, with rtnl->seq at 42. iovlen == 1, seq > becomes 43 on line 604, and a message is sent with nlmsg_seq == 43. If > a response with nlmsg_seq of 42 is received, the condition being fixed > in this patch would incorrectly accept it. > > Fixes: 72a2ff3916e5 ("lib/libnetlink: Add a new function rtnl_talk_iov") > Signed-off-by: Vlad Dumitrescu > --- > lib/libnetlink.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/libnetlink.c b/lib/libnetlink.c > index f18dceac..4d2416bf 100644 > --- a/lib/libnetlink.c > +++ b/lib/libnetlink.c > @@ -647,7 +647,7 @@ static int __rtnl_talk_iov(struct rtnl_handle *rtnl, > struct iovec *iov, > > if (nladdr.nl_pid != 0 || > h->nlmsg_pid != rtnl->local.nl_pid || > - h->nlmsg_seq > seq || h->nlmsg_seq < seq - > iovlen) { > + h->nlmsg_seq > seq || h->nlmsg_seq < seq - iovlen > + 1) { > /* Don't forget to skip that message. */ > status -= NLMSG_ALIGN(len); > h = (struct nlmsghdr *)((char *)h + > NLMSG_ALIGN(len)); > -- > 2.19.0.605.g01d371f741-goog Did anybody get a chance to review this? I'm not 100% sure I'm fixing the right thing. Thanks, Vlad
Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops
On Sat, Nov 11, 2017 at 2:38 PM, Alexei Starovoitov wrote: > > On 11/12/17 4:46 AM, Daniel Borkmann wrote: >> >> On 11/11/2017 05:06 AM, Alexei Starovoitov wrote: >>> >>> On 11/11/17 6:07 AM, Daniel Borkmann wrote: >>>> >>>> On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: >>>>> >>>>> From: Vlad Dumitrescu >>>>> >>>>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. >>>>> >>>>> Signed-off-by: Vlad Dumitrescu >>>>> --- >>>>> include/uapi/linux/bpf.h | 1 + >>>>> net/core/filter.c | 11 +++ >>>>> tools/include/uapi/linux/bpf.h | 1 + >>>>> 3 files changed, 13 insertions(+) >>>>> >>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>>>> index e880ae6434ee..9757a2002513 100644 >>>>> --- a/include/uapi/linux/bpf.h >>>>> +++ b/include/uapi/linux/bpf.h >>>>> @@ -947,6 +947,7 @@ struct bpf_sock_ops { >>>>> __u32 local_ip6[4];/* Stored in network byte order */ >>>>> __u32 remote_port;/* Stored in network byte order */ >>>>> __u32 local_port;/* stored in host byte order */ >>>>> +__u32 priority; >>>>> }; >>>>> /* List of known BPF sock_ops operators. >>>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>>> index 61c791f9f628..a6329642d047 100644 >>>>> --- a/net/core/filter.c >>>>> +++ b/net/core/filter.c >>>>> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum >>>>> bpf_access_type type, >>>>> *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, >>>>> offsetof(struct sock_common, skc_num)); >>>>> break; >>>>> + >>>>> +case offsetof(struct bpf_sock_ops, priority): >>>>> +BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); >>>>> + >>>>> +*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( >>>>> +struct bpf_sock_ops_kern, sk), >>>>> + si->dst_reg, si->src_reg, >>>>> + offsetof(struct bpf_sock_ops_kern, sk)); >>>>> +*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, >>>>> + offsetof(struct sock, sk_priority)); >>>>> +break; >>>> >>>> >>>> Hm, I don't think this would work, I actually think your initial patch >>>> was ok. >>>> bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) >>>> right >>>> before accessing options on either socket or TCP level, and bail out >>>> with error >>>> otherwise; in such cases we'd read something else here and assume it's >>>> sk_priority. >>> >>> >>> even if it's not fullsock, it will just read zero, no? what's a problem >>> with that? >>> In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB >>> the program author will know that it's meaningless to read sk_priority, >>> so returning zero with minimal checks is fine. >>> While adding extra runtime if (sk_fullsock(sk)) is unnecessary, >>> since the safety is not compromised. >> >> >> Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440, >> struct request_sock itself is only 232 byte long in total, and the struct >> inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds >> that way, so it cannot be ignored and assumed zero. > > > I thought we always pass fully allocated sock but technically not fullsock > yet. My mistake. We do: tcp_timeout_init((struct sock *)req)) > so yeah ctx rewrite approach won't work. > Let's go back to access via helper. > TIL. Thanks! Is there anything else needed from me to get the helper approach accepted?
[PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops
From: Vlad Dumitrescu Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. Signed-off-by: Vlad Dumitrescu --- include/uapi/linux/bpf.h | 1 + net/core/filter.c | 11 +++ tools/include/uapi/linux/bpf.h | 1 + 3 files changed, 13 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e880ae6434ee..9757a2002513 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -947,6 +947,7 @@ struct bpf_sock_ops { __u32 local_ip6[4]; /* Stored in network byte order */ __u32 remote_port; /* Stored in network byte order */ __u32 local_port; /* stored in host byte order */ + __u32 priority; }; /* List of known BPF sock_ops operators. diff --git a/net/core/filter.c b/net/core/filter.c index 61c791f9f628..a6329642d047 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, offsetof(struct sock_common, skc_num)); break; + + case offsetof(struct bpf_sock_ops, priority): + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); + + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( + struct bpf_sock_ops_kern, sk), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, sk)); + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, + offsetof(struct sock, sk_priority)); + break; } return insn - insn_buf; } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index e880ae6434ee..9757a2002513 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -947,6 +947,7 @@ struct bpf_sock_ops { __u32 local_ip6[4]; /* Stored in network byte order */ __u32 remote_port; /* Stored in network byte order */ __u32 local_port; /* stored in host byte order */ + __u32 priority; }; /* List of known BPF sock_ops operators. -- 2.15.0.448.gf294e3d99a-goog
Re: [PATCH net-next] bpf: add support for SO_PRIORITY in bpf_getsockopt
On Thu, Nov 9, 2017 at 4:43 PM, Alexei Starovoitov wrote: > On 11/10/17 8:04 AM, Vlad Dumitrescu wrote: >> >> From: Vlad Dumitrescu >> >> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. >> >> Signed-off-by: Vlad Dumitrescu >> --- >> net/core/filter.c | 16 ++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/net/core/filter.c b/net/core/filter.c >> index 1afa17935954..61c791f9f628 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -3292,8 +3292,20 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern >> *, bpf_sock, >> if (!sk_fullsock(sk)) >> goto err_clear; >> >> + if (level == SOL_SOCKET) { >> + if (optlen != sizeof(int)) >> + goto err_clear; >> + >> + switch (optname) { >> + case SO_PRIORITY: >> + *((int *)optval) = sk->sk_priority; > > > would be cleaner to add sk_priority to 'struct bpf_sock_ops' instead. > Faster runtime too. > I agree it will be faster, and I considered that as well. However, I was aiming for consistency with the set function, which supports SO_PRIORITY. Maybe both (I have no preference)? I'll prepare the patch for bpf_sock_ops in the meantime.
[PATCH net-next] bpf: add support for SO_PRIORITY in bpf_getsockopt
From: Vlad Dumitrescu Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. Signed-off-by: Vlad Dumitrescu --- net/core/filter.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 1afa17935954..61c791f9f628 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3292,8 +3292,20 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, bpf_sock, if (!sk_fullsock(sk)) goto err_clear; + if (level == SOL_SOCKET) { + if (optlen != sizeof(int)) + goto err_clear; + + switch (optname) { + case SO_PRIORITY: + *((int *)optval) = sk->sk_priority; + break; + default: + goto err_clear; + } #ifdef CONFIG_INET - if (level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) { + } else if (level == SOL_TCP && + sk->sk_prot->getsockopt == tcp_getsockopt) { if (optname == TCP_CONGESTION) { struct inet_connection_sock *icsk = inet_csk(sk); @@ -3304,11 +3316,11 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, bpf_sock, } else { goto err_clear; } +#endif } else { goto err_clear; } return 0; -#endif err_clear: memset(optval, 0, optlen); return -EINVAL; -- 2.15.0.448.gf294e3d99a-goog