On Tue, Nov 10, 2020 at 4:23 AM Jason Wang <jasow...@redhat.com> wrote:
> > On 2020/11/9 下午9:33, Yuri Benditovich wrote: > > > > > > On Mon, Nov 9, 2020 at 4:14 AM Jason Wang <jasow...@redhat.com > > <mailto:jasow...@redhat.com>> wrote: > > > > > > On 2020/11/5 下午11:13, Yuri Benditovich wrote: > > > First of all, thank you for all your feedbacks > > > > > > Please help me to summarize and let us understand better what we > > do in v2: > > > Major questions are: > > > 1. Building eBPF from source during qemu build vs. regenerating > > it on > > > demand and keeping in the repository > > > Solution 1a (~ as in v1): keep instructions or ELF in H file, > > generate > > > it out of qemu build. In general we'll need to have BE and LE > > binaries. > > > Solution 1b: build ELF or instructions during QEMU build if llvm + > > > clang exist. Then we will have only one (BE or LE, depending on > > > current QEMU build) > > > We agree with any solution - I believe you know the requirements > > better. > > > > > > I think we can go with 1a. (See Danial's comment) > > > > > > > > > > 2. Use libbpf or not > > > In general we do not see any advantage of using libbpf. It works > > with > > > object files (does ELF parsing at time of loading), but it does > > not do > > > any magic. > > > Solution 2a. Switch to libbpf, generate object files (LE and BE) > > from > > > source, keep them inside QEMU (~8k each) or aside > > > > > > Can we simply use dynamic linking here? > > > > > > Can you please explain, where exactly you suggest to use dynamic linking? > > > Yes. If I understand your 2a properly, you meant static linking of > libbpf. So what I want to ask is the possibility of dynamic linking of > libbpf here. > > As Daniel explained above, QEMU is always linked dynamically vs libraries. Also I see the libbpf package does not even contain the static library. If the build environment contains libbpf, the libbpf.so becomes runtime dependency, just as with other libs. > > > > > > Solution 2b. (as in v1) Use python script to parse object -> > > > instructions (~2k each) > > > We'd prefer not to use libbpf at the moment. > > > If due to some unknown reason we'll find it useful in future, we > > can > > > switch to it, this does not create any incompatibility. Then > > this will > > > create a dependency on libbpf.so > > > > > > I think we need to care about compatibility. E.g we need to enable > > BTF > > so I don't know how hard if we add BTF support in the current > > design. It > > would be probably OK it's not a lot of effort. > > > > > > As far as we understand BTF helps in BPF debugging and libbpf supports > > it as is. > > Without libbpf we in v1 load the BPF instructions only. > > If you think the BTF is mandatory (BTW, why?) I think it is better to > > switch to libbpf and keep the entire ELF in the qemu data. > > > It is used to make sure the BPF can do compile once run everywhere. > > This is explained in detail in here: > > https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html > . > > Thank you, then there is no question, we need to use libbpf. > Thanks > > > > > > > > > > > > 3. Keep instructions or ELF inside QEMU or as separate external > file > > > Solution 3a (~as in v1): Built-in array of instructions or ELF. > > If we > > > generate them out of QEMU build - keep 2 arrays or instructions > > or ELF > > > (BE and LE), > > > Solution 3b: Install them as separate files (/usr/share/qemu). > > > We'd prefer 3a: > > > Then there is a guarantee that the eBPF is built with exactly the > > > same config structures as QEMU (qemu creates a mapping of its > > > structures, eBPF uses them). > > > No need to take care on scenarios like 'file not found', 'file > > is not > > > suitable' etc > > > > > > Yes, let's go 3a for upstream. > > > > > > > > > > 4. Is there some real request to have the eBPF for big-endian? > > > If no, we can enable eBPF only for LE builds > > > > > > We can go with LE first. > > > > Thanks > > > > > > > > > > Jason, Daniel, Michael > > > Can you please let us know what you think and why? > > > > > > On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé > > <berra...@redhat.com <mailto:berra...@redhat.com> > > > <mailto:berra...@redhat.com <mailto:berra...@redhat.com>>> wrote: > > > > > > On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé > > wrote: > > > > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote: > > > > > > > > > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote: > > > > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang > > wrote: > > > > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote: > > > > > > > > > > > > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang > > > <jasow...@redhat.com <mailto:jasow...@redhat.com> > > <mailto:jasow...@redhat.com <mailto:jasow...@redhat.com>> > > > > > > > > <mailto:jasow...@redhat.com > > <mailto:jasow...@redhat.com> > > > <mailto:jasow...@redhat.com <mailto:jasow...@redhat.com>>>> > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 2020/11/3 上午2:51, Andrew Melnychenko wrote: > > > > > > > > > Basic idea is to use eBPF to calculate and > > steer > > > packets in TAP. > > > > > > > > > RSS(Receive Side Scaling) is used to > distribute > > > network packets > > > > > > > > to guest virtqueues > > > > > > > > > by calculating packet hash. > > > > > > > > > eBPF RSS allows us to use RSS with vhost TAP. > > > > > > > > > > > > > > > > > > This set of patches introduces the usage of > > eBPF > > > for packet steering > > > > > > > > > and RSS hash calculation: > > > > > > > > > * RSS(Receive Side Scaling) is used to > > distribute > > > network packets to > > > > > > > > > guest virtqueues by calculating packet hash > > > > > > > > > * eBPF RSS suppose to be faster than already > > > existing 'software' > > > > > > > > > implementation in QEMU > > > > > > > > > * Additionally adding support for the usage of > > > RSS with vhost > > > > > > > > > > > > > > > > > > Supported kernels: 5.8+ > > > > > > > > > > > > > > > > > > Implementation notes: > > > > > > > > > Linux TAP TUNSETSTEERINGEBPF ioctl was used to > > > set the eBPF program. > > > > > > > > > Added eBPF support to qemu directly through a > > > system call, see the > > > > > > > > > bpf(2) for details. > > > > > > > > > The eBPF program is part of the qemu and > > > presented as an array > > > > > > > > of bpf > > > > > > > > > instructions. > > > > > > > > > The program can be recompiled by provided > > > Makefile.ebpf(need to > > > > > > > > adjust > > > > > > > > > 'linuxhdrs'), > > > > > > > > > although it's not required to build QEMU with > > > eBPF support. > > > > > > > > > Added changes to virtio-net and vhost, primary > > > eBPF RSS is used. > > > > > > > > > 'Software' RSS used in the case of hash > > > population and as a > > > > > > > > fallback option. > > > > > > > > > For vhost, the hash population feature is not > > > reported to the guest. > > > > > > > > > > > > > > > > > > Please also see the documentation in PATCH > 6/6. > > > > > > > > > > > > > > > > > > I am sending those patches as RFC to > > initiate the > > > discussions > > > > > > > > and get > > > > > > > > > feedback on the following points: > > > > > > > > > * Fallback when eBPF is not supported by > > the kernel > > > > > > > > > > > > > > > > > > > > > > > > Yes, and it could also a lacking of CAP_BPF. > > > > > > > > > > > > > > > > > > > > > > > > > * Live migration to the kernel that doesn't > > have > > > eBPF support > > > > > > > > > > > > > > > > > > > > > > > > Is there anything that we needs special > > treatment here? > > > > > > > > > > > > > > > > Possible case: rss=on, vhost=on, source system with > > > kernel 5.8 > > > > > > > > (everything works) -> dest. system 5.6 (bpf does not > > > work), the adapter > > > > > > > > functions, but all the steering does not use > > proper queues. > > > > > > > > > > > > > > Right, I think we need to disable vhost on dest. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Integration with current QEMU build > > > > > > > > > > > > > > > > > > > > > > > > Yes, a question here: > > > > > > > > > > > > > > > > 1) Any reason for not using libbpf, e.g it > > has been > > > shipped with some > > > > > > > > distros > > > > > > > > > > > > > > > > > > > > > > > > We intentionally do not use libbpf, as it present > only > > > on some distros. > > > > > > > > We can switch to libbpf, but this will disable bpf if > > > libbpf is not > > > > > > > > installed > > > > > > > > > > > > > > That's better I think. > > > > > > > > > > > > > > > > > > > > > > 2) It would be better if we can avoid shipping > > > bytecodes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This creates new dependencies: llvm + clang + ... > > > > > > > > We would prefer byte code and ability to generate > > it if > > > prerequisites > > > > > > > > are installed. > > > > > > > > > > > > > > It's probably ok if we treat the bytecode as a kind of > > > firmware. > > > > > > That is explicitly *not* OK for inclusion in Fedora. They > > > require that > > > > > > BPF is compiled from source, and rejected my > > suggestion that > > > it could > > > > > > be considered a kind of firmware and thus have an > > exception > > > from building > > > > > > from source. > > > > > > > > > > > > > > > Please refer what it was done in DPDK: > > > > > > > > > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235 > > > > > > > > > > I don't think what proposed here makes anything different. > > > > > > > > I'm not convinced that what DPDK does is acceptable to > > Fedora either > > > > based on the responses I've received when asking about BPF > > handling > > > > during build. I wouldn't suprise me, however, if this was > > simply > > > > missed by reviewers when accepting DPDK into Fedora, > > because it is > > > > not entirely obvious unless you are looking closely. > > > > > > FWIW, I'm pushing back against the idea that we have to > > compile the > > > BPF code from master source, as I think it is reasonable to > > have the > > > program embedded as a static array in the source code > > similar to what > > > DPDK does. It doesn't feel much different from other places > > where > > > apps > > > use generated sources, and don't build them from the > > original source > > > every time. eg "configure" is never re-generated from > > > "configure.ac <http://configure.ac> <http://configure.ac>" > > > by Fedora packagers, they just use the generated "configure" > > script > > > as-is. > > > > > > Regards, > > > Daniel > > > -- > > > |: https://berrange.com -o- > > > https://www.flickr.com/photos/dberrange :| > > > |: https://libvirt.org -o- > > https://fstop138.berrange.com :| > > > |: https://entangle-photo.org -o- > > > https://www.instagram.com/dberrange :| > > > > > > >