Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Wed, Aug 27, 2014 at 12:18 PM, Stephen Hemminger wrote: > Something in man page format similar to FreeBSD man page: > http://www.freebsd.org/cgi/man.cgi?bpf(4) > > would be more readable and reviewable. Ok. will chop it into smallest diff possible and will add a doc for syscall only. I guess the problem is that we have too many docs now that talk about everything. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On 08/27/2014 09:18 PM, Stephen Hemminger wrote: Something in man page format similar to FreeBSD man page: http://www.freebsd.org/cgi/man.cgi?bpf(4) would be more readable and reviewable. I think at some point, we could perhaps do a section 7 page with a general overview of the engine and where it can be applied, and let the syscall page partially refer to it so that it doesn't get too long. So far, we tried to squeeze everything into Documentation/networking/filter.txt, and that itself is quite long already. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
Something in man page format similar to FreeBSD man page: http://www.freebsd.org/cgi/man.cgi?bpf(4) would be more readable and reviewable. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Tue, Aug 26, 2014 at 9:57 PM, Alexei Starovoitov wrote: > On Tue, Aug 26, 2014 at 9:49 PM, Andy Lutomirski wrote: >> On Tue, Aug 26, 2014 at 9:35 PM, Alexei Starovoitov >> wrote: >>> On Tue, Aug 26, 2014 at 8:56 PM, Andy Lutomirski >>> wrote: On Aug 26, 2014 7:29 PM, "Alexei Starovoitov" wrote: > > Hi Ingo, David, > > posting whole thing again as RFC to get feedback on syscall only. > If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, > I'll split them into small chunks as requested and will repost without > RFC. IMO it's much easier to review a syscall if we just look at a specification of what it does. The code is, in some sense, secondary. >>> >>> 'specification of what it does'... hmm, you mean beyond what's >>> there in commit logs and in Documentation/networking/filter.txt ? >>> Aren't samples at the end give an idea on 'what it does'? >>> I'm happy to add 'specification', I just don't understand yet what >>> it suppose to talk about beyond what's already written. >>> I understand that the patches are missing explanation on 'why' >>> the syscall is being added, but I don't think it's what you're asking... >> >> I mean a hopefully short document that defines what the syscall does. >> It should be precise enough that one could, in principle, implement >> the syscall just by reading the document and that one could use the >> syscall just by reading the document. >> >> Given that there's a whole instruction set to go with it, it may end >> up being moderately complicated or saying things like "see this other >> thing for a description of the instruction set" and "there are some >> extensible sets of functions you can call with it". > > I'm still lost. > > Here is the quote from Documentation/networking/filter.txt > " > 'maps' is a generic storage of different types for sharing data between kernel > and userspace. > > The maps are accessed from user space via BPF syscall, > which has commands: > - create a map with given type and attributes > map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size) > using attr->map_type, attr->key_size, attr->value_size, attr->max_entries > returns process-local file descriptor or negative error > > - lookup key in a given map > err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size) > using attr->map_fd, attr->key, attr->value > returns zero and stores found elem into value or negative error > > - create or update key/value pair in a given map > err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size) > using attr->map_fd, attr->key, attr->value > returns zero or negative error > > - find and delete element by key in a given map > err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size) > using attr->map_fd, attr->key > > - to delete map: close(fd) > Exiting process will delete maps automatically > > userspace programs uses this API to create/populate/read > maps that eBPF programs are concurrently updating. > " > and more in commit log: > " > - load eBPF program > fd = bpf(BPF_PROG_LOAD, union bpf_attr *attr, u32 size) > > where 'attr' is > struct { > enum bpf_prog_type prog_type; > __u32 insn_cnt; > struct bpf_insn __user *insns; > const char __user *license; > }; > insns - array of eBPF instructions > license - must be GPL compatible to call helper functions marked gpl_only > > - unload eBPF program > close(fd) > " > > Isn't it short and describes what it does? > Do you want me to describe what eBPF program can do? The problem is that everyone needs to dig around a very long patch series to find it. Since you're asking for a review of a syscall, it would be nice to have everything needed to review whether the syscall is a good idea in its present form in one place and to keep the amount of email under control. --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
> I'm personally not reviewing such a large patch series, sorry. > > You need to submit smaller sets if you want to get reasonable > review of your changes and ideas. Hello. As well, this clogs up the mailing boxes of other people who have no interest in the patch set. Thank you, Steven Stewart-Gallus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
From: Alexei Starovoitov Date: Tue, 26 Aug 2014 19:29:14 -0700 > posting whole thing again as RFC to get feedback on syscall only. > If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, I'm personally not reviewing such a large patch series, sorry. You need to submit smaller sets if you want to get reasonable review of your changes and ideas. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
From: Alexei Starovoitov a...@plumgrid.com Date: Tue, 26 Aug 2014 19:29:14 -0700 posting whole thing again as RFC to get feedback on syscall only. If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, I'm personally not reviewing such a large patch series, sorry. You need to submit smaller sets if you want to get reasonable review of your changes and ideas. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
I'm personally not reviewing such a large patch series, sorry. You need to submit smaller sets if you want to get reasonable review of your changes and ideas. Hello. As well, this clogs up the mailing boxes of other people who have no interest in the patch set. Thank you, Steven Stewart-Gallus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Tue, Aug 26, 2014 at 9:57 PM, Alexei Starovoitov a...@plumgrid.com wrote: On Tue, Aug 26, 2014 at 9:49 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Aug 26, 2014 at 9:35 PM, Alexei Starovoitov a...@plumgrid.com wrote: On Tue, Aug 26, 2014 at 8:56 PM, Andy Lutomirski l...@amacapital.net wrote: On Aug 26, 2014 7:29 PM, Alexei Starovoitov a...@plumgrid.com wrote: Hi Ingo, David, posting whole thing again as RFC to get feedback on syscall only. If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, I'll split them into small chunks as requested and will repost without RFC. IMO it's much easier to review a syscall if we just look at a specification of what it does. The code is, in some sense, secondary. 'specification of what it does'... hmm, you mean beyond what's there in commit logs and in Documentation/networking/filter.txt ? Aren't samples at the end give an idea on 'what it does'? I'm happy to add 'specification', I just don't understand yet what it suppose to talk about beyond what's already written. I understand that the patches are missing explanation on 'why' the syscall is being added, but I don't think it's what you're asking... I mean a hopefully short document that defines what the syscall does. It should be precise enough that one could, in principle, implement the syscall just by reading the document and that one could use the syscall just by reading the document. Given that there's a whole instruction set to go with it, it may end up being moderately complicated or saying things like see this other thing for a description of the instruction set and there are some extensible sets of functions you can call with it. I'm still lost. Here is the quote from Documentation/networking/filter.txt 'maps' is a generic storage of different types for sharing data between kernel and userspace. The maps are accessed from user space via BPF syscall, which has commands: - create a map with given type and attributes map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size) using attr-map_type, attr-key_size, attr-value_size, attr-max_entries returns process-local file descriptor or negative error - lookup key in a given map err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size) using attr-map_fd, attr-key, attr-value returns zero and stores found elem into value or negative error - create or update key/value pair in a given map err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size) using attr-map_fd, attr-key, attr-value returns zero or negative error - find and delete element by key in a given map err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size) using attr-map_fd, attr-key - to delete map: close(fd) Exiting process will delete maps automatically userspace programs uses this API to create/populate/read maps that eBPF programs are concurrently updating. and more in commit log: - load eBPF program fd = bpf(BPF_PROG_LOAD, union bpf_attr *attr, u32 size) where 'attr' is struct { enum bpf_prog_type prog_type; __u32 insn_cnt; struct bpf_insn __user *insns; const char __user *license; }; insns - array of eBPF instructions license - must be GPL compatible to call helper functions marked gpl_only - unload eBPF program close(fd) Isn't it short and describes what it does? Do you want me to describe what eBPF program can do? The problem is that everyone needs to dig around a very long patch series to find it. Since you're asking for a review of a syscall, it would be nice to have everything needed to review whether the syscall is a good idea in its present form in one place and to keep the amount of email under control. --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
Something in man page format similar to FreeBSD man page: http://www.freebsd.org/cgi/man.cgi?bpf(4) would be more readable and reviewable. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On 08/27/2014 09:18 PM, Stephen Hemminger wrote: Something in man page format similar to FreeBSD man page: http://www.freebsd.org/cgi/man.cgi?bpf(4) would be more readable and reviewable. I think at some point, we could perhaps do a section 7 page with a general overview of the engine and where it can be applied, and let the syscall page partially refer to it so that it doesn't get too long. So far, we tried to squeeze everything into Documentation/networking/filter.txt, and that itself is quite long already. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Wed, Aug 27, 2014 at 12:18 PM, Stephen Hemminger step...@networkplumber.org wrote: Something in man page format similar to FreeBSD man page: http://www.freebsd.org/cgi/man.cgi?bpf(4) would be more readable and reviewable. Ok. will chop it into smallest diff possible and will add a doc for syscall only. I guess the problem is that we have too many docs now that talk about everything. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Tue, Aug 26, 2014 at 9:49 PM, Andy Lutomirski wrote: > On Tue, Aug 26, 2014 at 9:35 PM, Alexei Starovoitov wrote: >> On Tue, Aug 26, 2014 at 8:56 PM, Andy Lutomirski wrote: >>> On Aug 26, 2014 7:29 PM, "Alexei Starovoitov" wrote: Hi Ingo, David, posting whole thing again as RFC to get feedback on syscall only. If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, I'll split them into small chunks as requested and will repost without RFC. >>> >>> IMO it's much easier to review a syscall if we just look at a >>> specification of what it does. The code is, in some sense, secondary. >> >> 'specification of what it does'... hmm, you mean beyond what's >> there in commit logs and in Documentation/networking/filter.txt ? >> Aren't samples at the end give an idea on 'what it does'? >> I'm happy to add 'specification', I just don't understand yet what >> it suppose to talk about beyond what's already written. >> I understand that the patches are missing explanation on 'why' >> the syscall is being added, but I don't think it's what you're asking... > > I mean a hopefully short document that defines what the syscall does. > It should be precise enough that one could, in principle, implement > the syscall just by reading the document and that one could use the > syscall just by reading the document. > > Given that there's a whole instruction set to go with it, it may end > up being moderately complicated or saying things like "see this other > thing for a description of the instruction set" and "there are some > extensible sets of functions you can call with it". I'm still lost. Here is the quote from Documentation/networking/filter.txt " 'maps' is a generic storage of different types for sharing data between kernel and userspace. The maps are accessed from user space via BPF syscall, which has commands: - create a map with given type and attributes map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size) using attr->map_type, attr->key_size, attr->value_size, attr->max_entries returns process-local file descriptor or negative error - lookup key in a given map err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size) using attr->map_fd, attr->key, attr->value returns zero and stores found elem into value or negative error - create or update key/value pair in a given map err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size) using attr->map_fd, attr->key, attr->value returns zero or negative error - find and delete element by key in a given map err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size) using attr->map_fd, attr->key - to delete map: close(fd) Exiting process will delete maps automatically userspace programs uses this API to create/populate/read maps that eBPF programs are concurrently updating. " and more in commit log: " - load eBPF program fd = bpf(BPF_PROG_LOAD, union bpf_attr *attr, u32 size) where 'attr' is struct { enum bpf_prog_type prog_type; __u32 insn_cnt; struct bpf_insn __user *insns; const char __user *license; }; insns - array of eBPF instructions license - must be GPL compatible to call helper functions marked gpl_only - unload eBPF program close(fd) " Isn't it short and describes what it does? Do you want me to describe what eBPF program can do? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Tue, Aug 26, 2014 at 9:35 PM, Alexei Starovoitov wrote: > On Tue, Aug 26, 2014 at 8:56 PM, Andy Lutomirski wrote: >> On Aug 26, 2014 7:29 PM, "Alexei Starovoitov" wrote: >>> >>> Hi Ingo, David, >>> >>> posting whole thing again as RFC to get feedback on syscall only. >>> If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, >>> I'll split them into small chunks as requested and will repost without RFC. >> >> IMO it's much easier to review a syscall if we just look at a >> specification of what it does. The code is, in some sense, secondary. > > 'specification of what it does'... hmm, you mean beyond what's > there in commit logs and in Documentation/networking/filter.txt ? > Aren't samples at the end give an idea on 'what it does'? > I'm happy to add 'specification', I just don't understand yet what > it suppose to talk about beyond what's already written. > I understand that the patches are missing explanation on 'why' > the syscall is being added, but I don't think it's what you're asking... I mean a hopefully short document that defines what the syscall does. It should be precise enough that one could, in principle, implement the syscall just by reading the document and that one could use the syscall just by reading the document. Given that there's a whole instruction set to go with it, it may end up being moderately complicated or saying things like "see this other thing for a description of the instruction set" and "there are some extensible sets of functions you can call with it". --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Tue, Aug 26, 2014 at 8:56 PM, Andy Lutomirski wrote: > On Aug 26, 2014 7:29 PM, "Alexei Starovoitov" wrote: >> >> Hi Ingo, David, >> >> posting whole thing again as RFC to get feedback on syscall only. >> If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, >> I'll split them into small chunks as requested and will repost without RFC. > > IMO it's much easier to review a syscall if we just look at a > specification of what it does. The code is, in some sense, secondary. 'specification of what it does'... hmm, you mean beyond what's there in commit logs and in Documentation/networking/filter.txt ? Aren't samples at the end give an idea on 'what it does'? I'm happy to add 'specification', I just don't understand yet what it suppose to talk about beyond what's already written. I understand that the patches are missing explanation on 'why' the syscall is being added, but I don't think it's what you're asking... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Aug 26, 2014 7:29 PM, "Alexei Starovoitov" wrote: > > Hi Ingo, David, > > posting whole thing again as RFC to get feedback on syscall only. > If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, > I'll split them into small chunks as requested and will repost without RFC. IMO it's much easier to review a syscall if we just look at a specification of what it does. The code is, in some sense, secondary. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC v7 net-next 00/28] BPF syscall
Hi Ingo, David, posting whole thing again as RFC to get feedback on syscall only. If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, I'll split them into small chunks as requested and will repost without RFC. Right now please only review syscall API patch 0003 introduces sys_bpf and first BPF_MAP_CREATE command patch 0005 adds four more commands to the same syscall patch 0007 adds BPF_PROG_LOAD command patch 0010 extends BPF_PROG_LOAD command with 3 more attributes patch 0021 adds user space wrapper for BPF syscall patch 0023 uses these wrappers in verifier testsuite Note that additions of commands and attributes kept this syscall backwards compatible from one patch to another. I've decided not to bother with forward compatiblity for now. We can address it later the way perf_event_open did. Please ignore other patches, since I cannot easily remove them without breaking compilation btw, tested on x64/i386 and comiled tested on arm with NET-less config. V6->V7: - only BPF syscall interface changed from long+nlattr+a_lot_of_type_casts to single 'union bpf_attr'. It pretty much removed all type casts in kernel and in user space that were there because of 'long' and because of 'nlattr' Thanks for feedback. I think this version is definitely cleaner. As a side note I've addressed Cong's comment regarding commit log. Now it documents syscall itself instead of wrappers of syscall. If anyone prefers to see patches in the browser, they are here: https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/log/?h=v7 Thanks! Alexei Starovoitov (28): net: filter: add "load 64-bit immediate" eBPF instruction net: filter: split filter.h and expose eBPF to user space bpf: introduce syscall(BPF, ...) and BPF maps bpf: enable bpf syscall on x64 and i386 bpf: add lookup/update/delete/iterate methods to BPF maps bpf: add hashtable type of BPF maps bpf: expand BPF syscall with program load/unload bpf: handle pseudo BPF_CALL insn bpf: verifier (add docs) bpf: verifier (add ability to receive verification log) bpf: handle pseudo BPF_LD_IMM64 insn bpf: verifier (add branch/goto checks) bpf: verifier (add verifier core) bpf: verifier (add state prunning optimization) bpf: allow eBPF programs to use maps bpf: split eBPF out of NET tracing: allow eBPF programs to be attached to events tracing: allow eBPF programs call printk() tracing: allow eBPF programs to be attached to kprobe/kretprobe tracing: allow eBPF programs to call ktime_get_ns() and get_current() samples: bpf: add mini eBPF library to manipulate maps and programs samples: bpf: example of tracing filters with eBPF bpf: verifier test bpf: llvm backend samples: bpf: elf file loader samples: bpf: eBPF example in C samples: bpf: counting eBPF example in C samples: bpf: IO latency analysis (iosnoop/heatmap) Documentation/networking/filter.txt| 313 +++- arch/Kconfig |3 + arch/x86/net/bpf_jit_comp.c| 17 + arch/x86/syscalls/syscall_32.tbl |1 + arch/x86/syscalls/syscall_64.tbl |1 + fs/btrfs/super.c |3 + include/linux/bpf.h| 139 ++ include/linux/filter.h | 303 +--- include/linux/ftrace_event.h |5 + include/linux/syscalls.h |3 +- include/trace/bpf_trace.h | 23 + include/trace/ftrace.h | 25 + include/uapi/asm-generic/unistd.h |4 +- include/uapi/linux/Kbuild |1 + include/uapi/linux/bpf.h | 439 + kernel/Makefile|2 +- kernel/bpf/Makefile|2 +- kernel/bpf/core.c | 17 + kernel/bpf/hashtab.c | 365 kernel/bpf/syscall.c | 645 +++ kernel/bpf/verifier.c | 1910 kernel/sys_ni.c|3 + kernel/trace/Kconfig |1 + kernel/trace/Makefile |1 + kernel/trace/bpf_trace.c | 264 +++ kernel/trace/trace.h |3 + kernel/trace/trace_events.c| 41 +- kernel/trace/trace_events_filter.c | 72 +- kernel/trace/trace_kprobe.c| 28 + kernel/trace/trace_syscalls.c | 32 + lib/test_bpf.c | 21 + net/Kconfig|1 + net/core/filter.c |2 + samples/bpf/Makefile | 28 +
[PATCH RFC v7 net-next 00/28] BPF syscall
Hi Ingo, David, posting whole thing again as RFC to get feedback on syscall only. If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, I'll split them into small chunks as requested and will repost without RFC. Right now please only review syscall API patch 0003 introduces sys_bpf and first BPF_MAP_CREATE command patch 0005 adds four more commands to the same syscall patch 0007 adds BPF_PROG_LOAD command patch 0010 extends BPF_PROG_LOAD command with 3 more attributes patch 0021 adds user space wrapper for BPF syscall patch 0023 uses these wrappers in verifier testsuite Note that additions of commands and attributes kept this syscall backwards compatible from one patch to another. I've decided not to bother with forward compatiblity for now. We can address it later the way perf_event_open did. Please ignore other patches, since I cannot easily remove them without breaking compilation btw, tested on x64/i386 and comiled tested on arm with NET-less config. V6-V7: - only BPF syscall interface changed from long+nlattr+a_lot_of_type_casts to single 'union bpf_attr'. It pretty much removed all type casts in kernel and in user space that were there because of 'long' and because of 'nlattr' Thanks for feedback. I think this version is definitely cleaner. As a side note I've addressed Cong's comment regarding commit log. Now it documents syscall itself instead of wrappers of syscall. If anyone prefers to see patches in the browser, they are here: https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/log/?h=v7 Thanks! Alexei Starovoitov (28): net: filter: add load 64-bit immediate eBPF instruction net: filter: split filter.h and expose eBPF to user space bpf: introduce syscall(BPF, ...) and BPF maps bpf: enable bpf syscall on x64 and i386 bpf: add lookup/update/delete/iterate methods to BPF maps bpf: add hashtable type of BPF maps bpf: expand BPF syscall with program load/unload bpf: handle pseudo BPF_CALL insn bpf: verifier (add docs) bpf: verifier (add ability to receive verification log) bpf: handle pseudo BPF_LD_IMM64 insn bpf: verifier (add branch/goto checks) bpf: verifier (add verifier core) bpf: verifier (add state prunning optimization) bpf: allow eBPF programs to use maps bpf: split eBPF out of NET tracing: allow eBPF programs to be attached to events tracing: allow eBPF programs call printk() tracing: allow eBPF programs to be attached to kprobe/kretprobe tracing: allow eBPF programs to call ktime_get_ns() and get_current() samples: bpf: add mini eBPF library to manipulate maps and programs samples: bpf: example of tracing filters with eBPF bpf: verifier test bpf: llvm backend samples: bpf: elf file loader samples: bpf: eBPF example in C samples: bpf: counting eBPF example in C samples: bpf: IO latency analysis (iosnoop/heatmap) Documentation/networking/filter.txt| 313 +++- arch/Kconfig |3 + arch/x86/net/bpf_jit_comp.c| 17 + arch/x86/syscalls/syscall_32.tbl |1 + arch/x86/syscalls/syscall_64.tbl |1 + fs/btrfs/super.c |3 + include/linux/bpf.h| 139 ++ include/linux/filter.h | 303 +--- include/linux/ftrace_event.h |5 + include/linux/syscalls.h |3 +- include/trace/bpf_trace.h | 23 + include/trace/ftrace.h | 25 + include/uapi/asm-generic/unistd.h |4 +- include/uapi/linux/Kbuild |1 + include/uapi/linux/bpf.h | 439 + kernel/Makefile|2 +- kernel/bpf/Makefile|2 +- kernel/bpf/core.c | 17 + kernel/bpf/hashtab.c | 365 kernel/bpf/syscall.c | 645 +++ kernel/bpf/verifier.c | 1910 kernel/sys_ni.c|3 + kernel/trace/Kconfig |1 + kernel/trace/Makefile |1 + kernel/trace/bpf_trace.c | 264 +++ kernel/trace/trace.h |3 + kernel/trace/trace_events.c| 41 +- kernel/trace/trace_events_filter.c | 72 +- kernel/trace/trace_kprobe.c| 28 + kernel/trace/trace_syscalls.c | 32 + lib/test_bpf.c | 21 + net/Kconfig|1 + net/core/filter.c |2 + samples/bpf/Makefile | 28 +
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Aug 26, 2014 7:29 PM, Alexei Starovoitov a...@plumgrid.com wrote: Hi Ingo, David, posting whole thing again as RFC to get feedback on syscall only. If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, I'll split them into small chunks as requested and will repost without RFC. IMO it's much easier to review a syscall if we just look at a specification of what it does. The code is, in some sense, secondary. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Tue, Aug 26, 2014 at 8:56 PM, Andy Lutomirski l...@amacapital.net wrote: On Aug 26, 2014 7:29 PM, Alexei Starovoitov a...@plumgrid.com wrote: Hi Ingo, David, posting whole thing again as RFC to get feedback on syscall only. If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, I'll split them into small chunks as requested and will repost without RFC. IMO it's much easier to review a syscall if we just look at a specification of what it does. The code is, in some sense, secondary. 'specification of what it does'... hmm, you mean beyond what's there in commit logs and in Documentation/networking/filter.txt ? Aren't samples at the end give an idea on 'what it does'? I'm happy to add 'specification', I just don't understand yet what it suppose to talk about beyond what's already written. I understand that the patches are missing explanation on 'why' the syscall is being added, but I don't think it's what you're asking... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Tue, Aug 26, 2014 at 9:35 PM, Alexei Starovoitov a...@plumgrid.com wrote: On Tue, Aug 26, 2014 at 8:56 PM, Andy Lutomirski l...@amacapital.net wrote: On Aug 26, 2014 7:29 PM, Alexei Starovoitov a...@plumgrid.com wrote: Hi Ingo, David, posting whole thing again as RFC to get feedback on syscall only. If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, I'll split them into small chunks as requested and will repost without RFC. IMO it's much easier to review a syscall if we just look at a specification of what it does. The code is, in some sense, secondary. 'specification of what it does'... hmm, you mean beyond what's there in commit logs and in Documentation/networking/filter.txt ? Aren't samples at the end give an idea on 'what it does'? I'm happy to add 'specification', I just don't understand yet what it suppose to talk about beyond what's already written. I understand that the patches are missing explanation on 'why' the syscall is being added, but I don't think it's what you're asking... I mean a hopefully short document that defines what the syscall does. It should be precise enough that one could, in principle, implement the syscall just by reading the document and that one could use the syscall just by reading the document. Given that there's a whole instruction set to go with it, it may end up being moderately complicated or saying things like see this other thing for a description of the instruction set and there are some extensible sets of functions you can call with it. --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v7 net-next 00/28] BPF syscall
On Tue, Aug 26, 2014 at 9:49 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Aug 26, 2014 at 9:35 PM, Alexei Starovoitov a...@plumgrid.com wrote: On Tue, Aug 26, 2014 at 8:56 PM, Andy Lutomirski l...@amacapital.net wrote: On Aug 26, 2014 7:29 PM, Alexei Starovoitov a...@plumgrid.com wrote: Hi Ingo, David, posting whole thing again as RFC to get feedback on syscall only. If syscall bpf(int cmd, union bpf_attr *attr, unsigned int size) is ok, I'll split them into small chunks as requested and will repost without RFC. IMO it's much easier to review a syscall if we just look at a specification of what it does. The code is, in some sense, secondary. 'specification of what it does'... hmm, you mean beyond what's there in commit logs and in Documentation/networking/filter.txt ? Aren't samples at the end give an idea on 'what it does'? I'm happy to add 'specification', I just don't understand yet what it suppose to talk about beyond what's already written. I understand that the patches are missing explanation on 'why' the syscall is being added, but I don't think it's what you're asking... I mean a hopefully short document that defines what the syscall does. It should be precise enough that one could, in principle, implement the syscall just by reading the document and that one could use the syscall just by reading the document. Given that there's a whole instruction set to go with it, it may end up being moderately complicated or saying things like see this other thing for a description of the instruction set and there are some extensible sets of functions you can call with it. I'm still lost. Here is the quote from Documentation/networking/filter.txt 'maps' is a generic storage of different types for sharing data between kernel and userspace. The maps are accessed from user space via BPF syscall, which has commands: - create a map with given type and attributes map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size) using attr-map_type, attr-key_size, attr-value_size, attr-max_entries returns process-local file descriptor or negative error - lookup key in a given map err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size) using attr-map_fd, attr-key, attr-value returns zero and stores found elem into value or negative error - create or update key/value pair in a given map err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size) using attr-map_fd, attr-key, attr-value returns zero or negative error - find and delete element by key in a given map err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size) using attr-map_fd, attr-key - to delete map: close(fd) Exiting process will delete maps automatically userspace programs uses this API to create/populate/read maps that eBPF programs are concurrently updating. and more in commit log: - load eBPF program fd = bpf(BPF_PROG_LOAD, union bpf_attr *attr, u32 size) where 'attr' is struct { enum bpf_prog_type prog_type; __u32 insn_cnt; struct bpf_insn __user *insns; const char __user *license; }; insns - array of eBPF instructions license - must be GPL compatible to call helper functions marked gpl_only - unload eBPF program close(fd) Isn't it short and describes what it does? Do you want me to describe what eBPF program can do? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/