Re: [PATCH bpf-next] bpf: libbpf: retry program creation without the name

2018-11-26 Thread Vlad Dumitrescu
On Fri, Nov 23, 2018 at 2:51 AM Quentin Monnet
 wrote:
>
> 2018-11-21 09:28 UTC-0800 ~ Stanislav Fomichev 
> > On 11/21, Quentin Monnet wrote:
> >> 2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev 
> >>> On 11/20, Alexei Starovoitov wrote:
>  On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:
> > On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
> >> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
> >>> On 11/20, Alexei Starovoitov wrote:
>  On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
> > [Recent commit 23499442c319 ("bpf: libbpf: retry map creation 
> > without
> > the name") fixed this issue for maps, let's do the same for 
> > programs.]
> >
> > Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> > to specify BPF obj name"), libbpf unconditionally sets 
> > bpf_attr->name
> > for programs. Pre v4.14 kernels don't know about programs names and
> > return an error about unexpected non-zero data. Retry sys_bpf 
> > without
> > a program name to cover older kernels.
> >
> > Signed-off-by: Stanislav Fomichev 
> > ---
> >   tools/lib/bpf/bpf.c | 10 ++
> >   1 file changed, 10 insertions(+)
> >
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 961e1b9fc592..cbe9d757c646 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct 
> > bpf_load_program_attr *load_attr,
> >   if (fd >= 0 || !log_buf || !log_buf_sz)
> >   return fd;
> >
> > + if (fd < 0 && errno == E2BIG && load_attr->name) {
> > + /* Retry the same syscall, but without the name.
> > +  * Pre v4.14 kernels don't support prog names.
> > +  */
> 
>  I'm afraid that will put unnecessary stress on the kernel.
>  This check needs to be tighter.
>  Like E2BIG and anything in the log_buf probably means that
>  E2BIG came from the verifier and nothing to do with prog_name.
>  Asking kernel to repeat is an unnecessary work.
> 
>  In general we need to think beyond this single prog_name field.
>  There are bunch of other fields in bpf_load_program_xattr() and 
>  older kernels
>  won't support them. Are we going to zero them out one by one
>  and retry? I don't think that would be practical.
> >>> I general, we don't want to zero anything out. However,
> >>> for this particular problem the rationale is the following:
> >>> In commit 88cda1c9da02 we started unconditionally setting 
> >>> {prog,map}->name
> >>> from the 'higher' libbpfc layer which breaks users on the older 
> >>> kernels.
> >>>
>  Also libbpf silently ignoring prog_name is not great for debugging.
>  A warning is needed.
>  But it cannot be done out of lib/bpf/bpf.c, since it's a set of 
>  syscall
>  wrappers.
>  Imo such "old kernel -> lets retry" feature should probably be done
>  at lib/bpf/libbpf.c level. inside load_program().
> >>> For maps bpftools calls bpf_create_map_xattr directly, that's why
> >>> for maps I did the retry on the lower level (and why for programs I 
> >>> initially
> >>> thought about doing the same). However, in this case maybe asking
> >>> user to omit 'name' argument might be a better option.
> >>>
> >>> For program names, I agree, we might think about doing it on the 
> >>> higher
> >>> level (although I'm not sure whether we want to have different API
> >>> expectations, i.e. bpf_create_map_xattr ignoring the name and
> >>> bpf_load_program_xattr not ignoring the name).
> >>>
> >>> So given that rationale above, what do you think is the best way to
> >>> move forward?
> >>> 1. Same patch, but tighten the retry check inside 
> >>> bpf_load_program_xattr ?
> >>> 2. Move this retry logic into load_program and have different handling
> >>> for bpf_create_map_xattr vs bpf_load_program_xattr ?
> >>> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
> >>> into bpf_object__create_maps ?
> >>>
> >>> (I'm slightly leaning towards #3)
> >>
> >> me too. I think it's cleaner for maps to do it in
> >> bpf_object__create_maps().
> >> Originally bpf.c was envisioned to be a thin layer on top of bpf 
> >> syscall.
> >> Whereas 'smart bits' would go into libbpf.c
> >
> > Can't we create in bpf_object__load() a small helper 
> > bpf_object__probe_caps()
> > which would figure this out _once_ upon start with a few things to 
> > probe for
> > availabili

[PATCH v2 bpf-next] bpf: add skb->tstamp r/w access from tc clsact and cg skb progs

2018-11-22 Thread Vlad Dumitrescu
This could be used to rate limit egress traffic in concert with a qdisc
which supports Earliest Departure Time, such as FQ.

Write access from cg skb progs only with CAP_SYS_ADMIN, since the value
will be used by downstream qdiscs. It might make sense to relax this.

Changes v1 -> v2:
  - allow access from cg skb, write only with CAP_SYS_ADMIN

Signed-off-by: Vlad Dumitrescu 
---
 include/uapi/linux/bpf.h|  1 +
 net/core/filter.c   | 29 +
 tools/include/uapi/linux/bpf.h  |  1 +
 tools/testing/selftests/bpf/test_verifier.c | 29 +
 4 files changed, 60 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c1554aa074659..23e2031a43d43 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2468,6 +2468,7 @@ struct __sk_buff {
 
__u32 data_meta;
struct bpf_flow_keys *flow_keys;
+   __u64 tstamp;
 };
 
 struct bpf_tunnel_key {
diff --git a/net/core/filter.c b/net/core/filter.c
index f6ca38a7d4332..65dc13aeca7c4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5573,6 +5573,10 @@ static bool bpf_skb_is_valid_access(int off, int size, 
enum bpf_access_type type
if (size != sizeof(struct bpf_flow_keys *))
return false;
break;
+   case bpf_ctx_range(struct __sk_buff, tstamp):
+   if (size != sizeof(__u64))
+   return false;
+   break;
default:
/* Only narrow read access allowed for now. */
if (type == BPF_WRITE) {
@@ -5600,6 +5604,7 @@ static bool sk_filter_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, data_end):
case bpf_ctx_range(struct __sk_buff, flow_keys):
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
return false;
}
 
@@ -5638,6 +5643,10 @@ static bool cg_skb_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, priority):
case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
break;
+   case bpf_ctx_range(struct __sk_buff, tstamp):
+   if (!capable(CAP_SYS_ADMIN))
+   return false;
+   break;
default:
return false;
}
@@ -5665,6 +5674,7 @@ static bool lwt_is_valid_access(int off, int size,
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, flow_keys):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
return false;
}
 
@@ -5874,6 +5884,7 @@ static bool tc_cls_act_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, priority):
case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
break;
default:
return false;
@@ -6093,6 +6104,7 @@ static bool sk_skb_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, flow_keys):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
return false;
}
 
@@ -6179,6 +6191,7 @@ static bool flow_dissector_is_valid_access(int off, int 
size,
case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
return false;
}
 
@@ -6488,6 +6501,22 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type 
type,
*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg,
  si->src_reg, off);
break;
+
+   case offsetof(struct __sk_buff, tstamp):
+   BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, tstamp) != 8);
+
+   if (type == BPF_WRITE)
+   *insn++ = BPF_STX_MEM(BPF_DW,
+ si->dst_reg, si->src_reg,
+ bpf_target_off(struct sk_buff,
+tstamp, 8,
+target_size));
+   else
+   *insn++ = BPF_LDX_MEM(BPF_DW,
+ si->dst_reg, si->src_reg,
+ 

Re: [PATCH bpf-next] bpf: add read/write access to skb->tstamp from tc clsact progs

2018-11-21 Thread Vlad Dumitrescu
On Wed, Nov 21, 2018 at 5:08 AM Eric Dumazet  wrote:
>
>
>
> On 11/20/2018 06:40 PM, Alexei Starovoitov wrote:
>
> >
> > looks good to me.
> >
> > Any particular reason you decided to disable it for cg_skb ?
> > It seems to me the same EDT approach will work from
> > cgroup-bpf skb hooks just as well and then we can have neat
> > way of controlling traffic per-container instead of tc-clsbpf global.
> > If you're already on cgroup v2 it will save you a lot of classifier
> > cycles, since you'd be able to group apps by cgroup
> > instead of relying on ip only.
>
> Vlad first wrote a complete version, but we felt explaining the _why_
> was probably harder.
>
> No particular reason, other than having to write more tests perhaps.

This sounds reasonable to me. I can prepare a v2.

Any concerns regarding capabilities? For example data and data_end are
only available to CAP_SYS_ADMIN. Note that enforcement of this would
be done by a global component later in the pipeline (e.g., FQ qdisc).

Any opinions on sk_filter, lwt, and sk_skb before I send v2?


[PATCH bpf-next] bpf: add read/write access to skb->tstamp from tc clsact progs

2018-11-20 Thread Vlad Dumitrescu
This could be used to rate limit egress traffic in concert with a qdisc
which supports Earliest Departure Time, such as FQ.

Signed-off-by: Vlad Dumitrescu 
---
 include/uapi/linux/bpf.h|  1 +
 net/core/filter.c   | 26 +
 tools/include/uapi/linux/bpf.h  |  1 +
 tools/testing/selftests/bpf/test_verifier.c |  4 
 4 files changed, 32 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c1554aa074659..23e2031a43d43 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2468,6 +2468,7 @@ struct __sk_buff {
 
__u32 data_meta;
struct bpf_flow_keys *flow_keys;
+   __u64 tstamp;
 };
 
 struct bpf_tunnel_key {
diff --git a/net/core/filter.c b/net/core/filter.c
index f6ca38a7d4332..c45155c8e519c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5573,6 +5573,10 @@ static bool bpf_skb_is_valid_access(int off, int size, 
enum bpf_access_type type
if (size != sizeof(struct bpf_flow_keys *))
return false;
break;
+   case bpf_ctx_range(struct __sk_buff, tstamp):
+   if (size != sizeof(__u64))
+   return false;
+   break;
default:
/* Only narrow read access allowed for now. */
if (type == BPF_WRITE) {
@@ -5600,6 +5604,7 @@ static bool sk_filter_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, data_end):
case bpf_ctx_range(struct __sk_buff, flow_keys):
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
return false;
}
 
@@ -5624,6 +5629,7 @@ static bool cg_skb_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, flow_keys):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
return false;
case bpf_ctx_range(struct __sk_buff, data):
case bpf_ctx_range(struct __sk_buff, data_end):
@@ -5665,6 +5671,7 @@ static bool lwt_is_valid_access(int off, int size,
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, flow_keys):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
return false;
}
 
@@ -5874,6 +5881,7 @@ static bool tc_cls_act_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, priority):
case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
break;
default:
return false;
@@ -6093,6 +6101,7 @@ static bool sk_skb_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, flow_keys):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
return false;
}
 
@@ -6179,6 +6188,7 @@ static bool flow_dissector_is_valid_access(int off, int 
size,
case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
+   case bpf_ctx_range(struct __sk_buff, tstamp):
return false;
}
 
@@ -6488,6 +6498,22 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type 
type,
*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg,
  si->src_reg, off);
break;
+
+   case offsetof(struct __sk_buff, tstamp):
+   BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, tstamp) != 8);
+
+   if (type == BPF_WRITE)
+   *insn++ = BPF_STX_MEM(BPF_DW,
+ si->dst_reg, si->src_reg,
+ bpf_target_off(struct sk_buff,
+tstamp, 8,
+target_size));
+   else
+   *insn++ = BPF_LDX_MEM(BPF_DW,
+ si->dst_reg, si->src_reg,
+ bpf_target_off(struct sk_buff,
+tstamp, 8,
+target_size));
}
 
return insn - insn_buf;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c1554aa074659..23

Re: [PATCH iproute2] lib/libnetlink: fix response seq check

2018-10-03 Thread Vlad Dumitrescu
Hi,

On Fri, Sep 28, 2018 at 10:14 AM  wrote:
>
> From: Vlad Dumitrescu 
>
> Taking a one-iovec example, with rtnl->seq at 42. iovlen == 1, seq
> becomes 43 on line 604, and a message is sent with nlmsg_seq == 43. If
> a response with nlmsg_seq of 42 is received, the condition being fixed
> in this patch would incorrectly accept it.
>
> Fixes: 72a2ff3916e5 ("lib/libnetlink: Add a new function rtnl_talk_iov")
> Signed-off-by: Vlad Dumitrescu 
> ---
>  lib/libnetlink.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> index f18dceac..4d2416bf 100644
> --- a/lib/libnetlink.c
> +++ b/lib/libnetlink.c
> @@ -647,7 +647,7 @@ static int __rtnl_talk_iov(struct rtnl_handle *rtnl, 
> struct iovec *iov,
>
> if (nladdr.nl_pid != 0 ||
> h->nlmsg_pid != rtnl->local.nl_pid ||
> -   h->nlmsg_seq > seq || h->nlmsg_seq < seq - 
> iovlen) {
> +   h->nlmsg_seq > seq || h->nlmsg_seq < seq - iovlen 
> + 1) {
> /* Don't forget to skip that message. */
> status -= NLMSG_ALIGN(len);
> h = (struct nlmsghdr *)((char *)h + 
> NLMSG_ALIGN(len));
> --
> 2.19.0.605.g01d371f741-goog

Did anybody get a chance to review this? I'm not 100% sure I'm fixing
the right thing.

Thanks,
Vlad


Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops

2017-11-13 Thread Vlad Dumitrescu
On Sat, Nov 11, 2017 at 2:38 PM, Alexei Starovoitov  wrote:
>
> On 11/12/17 4:46 AM, Daniel Borkmann wrote:
>>
>> On 11/11/2017 05:06 AM, Alexei Starovoitov wrote:
>>>
>>> On 11/11/17 6:07 AM, Daniel Borkmann wrote:
>>>>
>>>> On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote:
>>>>>
>>>>> From: Vlad Dumitrescu 
>>>>>
>>>>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority.
>>>>>
>>>>> Signed-off-by: Vlad Dumitrescu 
>>>>> ---
>>>>>   include/uapi/linux/bpf.h   |  1 +
>>>>>   net/core/filter.c  | 11 +++
>>>>>   tools/include/uapi/linux/bpf.h |  1 +
>>>>>   3 files changed, 13 insertions(+)
>>>>>
>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>>> index e880ae6434ee..9757a2002513 100644
>>>>> --- a/include/uapi/linux/bpf.h
>>>>> +++ b/include/uapi/linux/bpf.h
>>>>> @@ -947,6 +947,7 @@ struct bpf_sock_ops {
>>>>>   __u32 local_ip6[4];/* Stored in network byte order */
>>>>>   __u32 remote_port;/* Stored in network byte order */
>>>>>   __u32 local_port;/* stored in host byte order */
>>>>> +__u32 priority;
>>>>>   };
>>>>> /* List of known BPF sock_ops operators.
>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>> index 61c791f9f628..a6329642d047 100644
>>>>> --- a/net/core/filter.c
>>>>> +++ b/net/core/filter.c
>>>>> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum
>>>>> bpf_access_type type,
>>>>>   *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
>>>>> offsetof(struct sock_common, skc_num));
>>>>>   break;
>>>>> +
>>>>> +case offsetof(struct bpf_sock_ops, priority):
>>>>> +BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4);
>>>>> +
>>>>> +*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
>>>>> +struct bpf_sock_ops_kern, sk),
>>>>> +  si->dst_reg, si->src_reg,
>>>>> +  offsetof(struct bpf_sock_ops_kern, sk));
>>>>> +*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
>>>>> +  offsetof(struct sock, sk_priority));
>>>>> +break;
>>>>
>>>>
>>>> Hm, I don't think this would work, I actually think your initial patch
>>>> was ok.
>>>> bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk)
>>>> right
>>>> before accessing options on either socket or TCP level, and bail out
>>>> with error
>>>> otherwise; in such cases we'd read something else here and assume it's
>>>> sk_priority.
>>>
>>>
>>> even if it's not fullsock, it will just read zero, no? what's a problem
>>> with that?
>>> In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
>>> the program author will know that it's meaningless to read sk_priority,
>>> so returning zero with minimal checks is fine.
>>> While adding extra runtime if (sk_fullsock(sk)) is unnecessary,
>>> since the safety is not compromised.
>>
>>
>> Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440,
>> struct request_sock itself is only 232 byte long in total, and the struct
>> inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds
>> that way, so it cannot be ignored and assumed zero.
>
>
> I thought we always pass fully allocated sock but technically not fullsock 
> yet. My mistake. We do: tcp_timeout_init((struct sock *)req))
> so yeah ctx rewrite approach won't work.
> Let's go back to access via helper.
>

TIL. Thanks!

Is there anything else needed from me to get the helper approach accepted?


[PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops

2017-11-10 Thread Vlad Dumitrescu
From: Vlad Dumitrescu 

Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority.

Signed-off-by: Vlad Dumitrescu 
---
 include/uapi/linux/bpf.h   |  1 +
 net/core/filter.c  | 11 +++
 tools/include/uapi/linux/bpf.h |  1 +
 3 files changed, 13 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e880ae6434ee..9757a2002513 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -947,6 +947,7 @@ struct bpf_sock_ops {
__u32 local_ip6[4]; /* Stored in network byte order */
__u32 remote_port;  /* Stored in network byte order */
__u32 local_port;   /* stored in host byte order */
+   __u32 priority;
 };
 
 /* List of known BPF sock_ops operators.
diff --git a/net/core/filter.c b/net/core/filter.c
index 61c791f9f628..a6329642d047 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum 
bpf_access_type type,
*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
  offsetof(struct sock_common, skc_num));
break;
+
+   case offsetof(struct bpf_sock_ops, priority):
+   BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4);
+
+   *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+   struct bpf_sock_ops_kern, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct bpf_sock_ops_kern, sk));
+   *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+ offsetof(struct sock, sk_priority));
+   break;
}
return insn - insn_buf;
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e880ae6434ee..9757a2002513 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -947,6 +947,7 @@ struct bpf_sock_ops {
__u32 local_ip6[4]; /* Stored in network byte order */
__u32 remote_port;  /* Stored in network byte order */
__u32 local_port;   /* stored in host byte order */
+   __u32 priority;
 };
 
 /* List of known BPF sock_ops operators.
-- 
2.15.0.448.gf294e3d99a-goog



Re: [PATCH net-next] bpf: add support for SO_PRIORITY in bpf_getsockopt

2017-11-10 Thread Vlad Dumitrescu
On Thu, Nov 9, 2017 at 4:43 PM, Alexei Starovoitov  wrote:
> On 11/10/17 8:04 AM, Vlad Dumitrescu wrote:
>>
>> From: Vlad Dumitrescu 
>>
>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority.
>>
>> Signed-off-by: Vlad Dumitrescu 
>> ---
>>  net/core/filter.c | 16 ++--
>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 1afa17935954..61c791f9f628 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -3292,8 +3292,20 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern
>> *, bpf_sock,
>> if (!sk_fullsock(sk))
>> goto err_clear;
>>
>> +   if (level == SOL_SOCKET) {
>> +   if (optlen != sizeof(int))
>> +   goto err_clear;
>> +
>> +   switch (optname) {
>> +   case SO_PRIORITY:
>> +   *((int *)optval) = sk->sk_priority;
>
>
> would be cleaner to add sk_priority to 'struct bpf_sock_ops' instead.
> Faster runtime too.
>

I agree it will be faster, and I considered that as well. However, I
was aiming for consistency with the set function, which supports
SO_PRIORITY.

Maybe both (I have no preference)? I'll prepare the patch for
bpf_sock_ops in the meantime.


[PATCH net-next] bpf: add support for SO_PRIORITY in bpf_getsockopt

2017-11-09 Thread Vlad Dumitrescu
From: Vlad Dumitrescu 

Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority.

Signed-off-by: Vlad Dumitrescu 
---
 net/core/filter.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 1afa17935954..61c791f9f628 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3292,8 +3292,20 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
if (!sk_fullsock(sk))
goto err_clear;
 
+   if (level == SOL_SOCKET) {
+   if (optlen != sizeof(int))
+   goto err_clear;
+
+   switch (optname) {
+   case SO_PRIORITY:
+   *((int *)optval) = sk->sk_priority;
+   break;
+   default:
+   goto err_clear;
+   }
 #ifdef CONFIG_INET
-   if (level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) {
+   } else if (level == SOL_TCP &&
+  sk->sk_prot->getsockopt == tcp_getsockopt) {
if (optname == TCP_CONGESTION) {
struct inet_connection_sock *icsk = inet_csk(sk);
 
@@ -3304,11 +3316,11 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
} else {
goto err_clear;
}
+#endif
} else {
goto err_clear;
}
return 0;
-#endif
 err_clear:
memset(optval, 0, optlen);
return -EINVAL;
-- 
2.15.0.448.gf294e3d99a-goog