On Tue, Nov 10, 2020 at 4:23 AM Jason Wang <jasow...@redhat.com> wrote:

>
> On 2020/11/9 下午9:33, Yuri Benditovich wrote:
> >
> >
> > On Mon, Nov 9, 2020 at 4:14 AM Jason Wang <jasow...@redhat.com
> > <mailto:jasow...@redhat.com>> wrote:
> >
> >
> >     On 2020/11/5 下午11:13, Yuri Benditovich wrote:
> >     > First of all, thank you for all your feedbacks
> >     >
> >     > Please help me to summarize and let us understand better what we
> >     do in v2:
> >     > Major questions are:
> >     > 1. Building eBPF from source during qemu build vs. regenerating
> >     it on
> >     > demand and keeping in the repository
> >     > Solution 1a (~ as in v1): keep instructions or ELF in H file,
> >     generate
> >     > it out of qemu build. In general we'll need to have BE and LE
> >     binaries.
> >     > Solution 1b: build ELF or instructions during QEMU build if llvm +
> >     > clang exist. Then we will have only one (BE or LE, depending on
> >     > current QEMU build)
> >     > We agree with any solution - I believe you know the requirements
> >     better.
> >
> >
> >     I think we can go with 1a. (See Danial's comment)
> >
> >
> >     >
> >     > 2. Use libbpf or not
> >     > In general we do not see any advantage of using libbpf. It works
> >     with
> >     > object files (does ELF parsing at time of loading), but it does
> >     not do
> >     > any magic.
> >     > Solution 2a. Switch to libbpf, generate object files (LE and BE)
> >     from
> >     > source, keep them inside QEMU (~8k each) or aside
> >
> >
> >     Can we simply use dynamic linking here?
> >
> >
> > Can you please explain, where exactly you suggest to use dynamic linking?
>
>
> Yes. If I understand your 2a properly, you meant static linking of
> libbpf. So what I want to ask is the possibility of dynamic linking of
> libbpf here.
>
>
As Daniel explained above, QEMU is always linked dynamically vs libraries.
Also I see the libbpf package does not even contain the static library.
If the build environment contains libbpf, the libbpf.so becomes runtime
dependency, just as with other libs.


>
> >
> >     > Solution 2b. (as in v1) Use python script to parse object ->
> >     > instructions (~2k each)
> >     > We'd prefer not to use libbpf at the moment.
> >     > If due to some unknown reason we'll find it useful in future, we
> >     can
> >     > switch to it, this does not create any incompatibility. Then
> >     this will
> >     > create a dependency on libbpf.so
> >
> >
> >     I think we need to care about compatibility. E.g we need to enable
> >     BTF
> >     so I don't know how hard if we add BTF support in the current
> >     design. It
> >     would be probably OK it's not a lot of effort.
> >
> >
> > As far as we understand BTF helps in BPF debugging and libbpf supports
> > it as is.
> > Without libbpf we in v1 load the BPF instructions only.
> > If you think the BTF is mandatory (BTW, why?) I think it is better to
> > switch to libbpf and keep the entire ELF in the qemu data.
>
>
> It is used to make sure the BPF can do compile once run everywhere.
>
> This is explained in detail in here:
>
> https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html
> .
>
>
Thank you, then there is no question, we need to use libbpf.


> Thanks
>
>
> >
> >
> >     >
> >     > 3. Keep instructions or ELF inside QEMU or as separate external
> file
> >     > Solution 3a (~as in v1): Built-in array of instructions or ELF.
> >     If we
> >     > generate them out of QEMU build - keep 2 arrays or instructions
> >     or ELF
> >     > (BE and LE),
> >     > Solution 3b: Install them as separate files (/usr/share/qemu).
> >     > We'd prefer 3a:
> >     >  Then there is a guarantee that the eBPF is built with exactly the
> >     > same config structures as QEMU (qemu creates a mapping of its
> >     > structures, eBPF uses them).
> >     >  No need to take care on scenarios like 'file not found', 'file
> >     is not
> >     > suitable' etc
> >
> >
> >     Yes, let's go 3a for upstream.
> >
> >
> >     >
> >     > 4. Is there some real request to have the eBPF for big-endian?
> >     > If no, we can enable eBPF only for LE builds
> >
> >
> >     We can go with LE first.
> >
> >     Thanks
> >
> >
> >     >
> >     > Jason, Daniel, Michael
> >     > Can you please let us know what you think and why?
> >     >
> >     > On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé
> >     <berra...@redhat.com <mailto:berra...@redhat.com>
> >     > <mailto:berra...@redhat.com <mailto:berra...@redhat.com>>> wrote:
> >     >
> >     >     On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé
> >     wrote:
> >     >     > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> >     >     > >
> >     >     > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> >     >     > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang
> >     wrote:
> >     >     > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> >     >     > > > > >
> >     >     > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang
> >     >     <jasow...@redhat.com <mailto:jasow...@redhat.com>
> >     <mailto:jasow...@redhat.com <mailto:jasow...@redhat.com>>
> >     >     > > > > > <mailto:jasow...@redhat.com
> >     <mailto:jasow...@redhat.com>
> >     >     <mailto:jasow...@redhat.com <mailto:jasow...@redhat.com>>>>
> >     wrote:
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> >     >     > > > > >      > Basic idea is to use eBPF to calculate and
> >     steer
> >     >     packets in TAP.
> >     >     > > > > >      > RSS(Receive Side Scaling) is used to
> distribute
> >     >     network packets
> >     >     > > > > >      to guest virtqueues
> >     >     > > > > >      > by calculating packet hash.
> >     >     > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> >     >     > > > > >      >
> >     >     > > > > >      > This set of patches introduces the usage of
> >     eBPF
> >     >     for packet steering
> >     >     > > > > >      > and RSS hash calculation:
> >     >     > > > > >      > * RSS(Receive Side Scaling) is used to
> >     distribute
> >     >     network packets to
> >     >     > > > > >      > guest virtqueues by calculating packet hash
> >     >     > > > > >      > * eBPF RSS suppose to be faster than already
> >     >     existing 'software'
> >     >     > > > > >      > implementation in QEMU
> >     >     > > > > >      > * Additionally adding support for the usage of
> >     >     RSS with vhost
> >     >     > > > > >      >
> >     >     > > > > >      > Supported kernels: 5.8+
> >     >     > > > > >      >
> >     >     > > > > >      > Implementation notes:
> >     >     > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to
> >     >     set the eBPF program.
> >     >     > > > > >      > Added eBPF support to qemu directly through a
> >     >     system call, see the
> >     >     > > > > >      > bpf(2) for details.
> >     >     > > > > >      > The eBPF program is part of the qemu and
> >     >     presented as an array
> >     >     > > > > >      of bpf
> >     >     > > > > >      > instructions.
> >     >     > > > > >      > The program can be recompiled by provided
> >     >     Makefile.ebpf(need to
> >     >     > > > > >      adjust
> >     >     > > > > >      > 'linuxhdrs'),
> >     >     > > > > >      > although it's not required to build QEMU with
> >     >     eBPF support.
> >     >     > > > > >      > Added changes to virtio-net and vhost, primary
> >     >     eBPF RSS is used.
> >     >     > > > > >      > 'Software' RSS used in the case of hash
> >     >     population and as a
> >     >     > > > > >      fallback option.
> >     >     > > > > >      > For vhost, the hash population feature is not
> >     >     reported to the guest.
> >     >     > > > > >      >
> >     >     > > > > >      > Please also see the documentation in PATCH
> 6/6.
> >     >     > > > > >      >
> >     >     > > > > >      > I am sending those patches as RFC to
> >     initiate the
> >     >     discussions
> >     >     > > > > >      and get
> >     >     > > > > >      > feedback on the following points:
> >     >     > > > > >      > * Fallback when eBPF is not supported by
> >     the kernel
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      Yes, and it could also a lacking of CAP_BPF.
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      > * Live migration to the kernel that doesn't
> >     have
> >     >     eBPF support
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      Is there anything that we needs special
> >     treatment here?
> >     >     > > > > >
> >     >     > > > > > Possible case: rss=on, vhost=on, source system with
> >     >     kernel 5.8
> >     >     > > > > > (everything works) -> dest. system 5.6 (bpf does not
> >     >     work), the adapter
> >     >     > > > > > functions, but all the steering does not use
> >     proper queues.
> >     >     > > > >
> >     >     > > > > Right, I think we need to disable vhost on dest.
> >     >     > > > >
> >     >     > > > >
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      > * Integration with current QEMU build
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      Yes, a question here:
> >     >     > > > > >
> >     >     > > > > >      1) Any reason for not using libbpf, e.g it
> >     has been
> >     >     shipped with some
> >     >     > > > > >      distros
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > > We intentionally do not use libbpf, as it present
> only
> >     >     on some distros.
> >     >     > > > > > We can switch to libbpf, but this will disable bpf if
> >     >     libbpf is not
> >     >     > > > > > installed
> >     >     > > > >
> >     >     > > > > That's better I think.
> >     >     > > > >
> >     >     > > > >
> >     >     > > > > >      2) It would be better if we can avoid shipping
> >     >     bytecodes
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > > This creates new dependencies: llvm + clang + ...
> >     >     > > > > > We would prefer byte code and ability to generate
> >     it if
> >     >     prerequisites
> >     >     > > > > > are installed.
> >     >     > > > >
> >     >     > > > > It's probably ok if we treat the bytecode as a kind of
> >     >     firmware.
> >     >     > > > That is explicitly *not* OK for inclusion in Fedora. They
> >     >     require that
> >     >     > > > BPF is compiled from source, and rejected my
> >     suggestion that
> >     >     it could
> >     >     > > > be considered a kind of firmware and thus have an
> >     exception
> >     >     from building
> >     >     > > > from source.
> >     >     > >
> >     >     > >
> >     >     > > Please refer what it was done in DPDK:
> >     >     > >
> >     >     > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> >     >     > >
> >     >     > > I don't think what proposed here makes anything different.
> >     >     >
> >     >     > I'm not convinced that what DPDK does is acceptable to
> >     Fedora either
> >     >     > based on the responses I've received when asking about BPF
> >     handling
> >     >     > during build.  I wouldn't suprise me, however, if this was
> >     simply
> >     >     > missed by reviewers when accepting DPDK into Fedora,
> >     because it is
> >     >     > not entirely obvious unless you are looking closely.
> >     >
> >     >     FWIW, I'm pushing back against the idea that we have to
> >     compile the
> >     >     BPF code from master source, as I think it is reasonable to
> >     have the
> >     >     program embedded as a static array in the source code
> >     similar to what
> >     >     DPDK does.  It doesn't feel much different from other places
> >     where
> >     >     apps
> >     >     use generated sources, and don't build them from the
> >     original source
> >     >     every time. eg "configure" is never re-generated from
> >     >     "configure.ac <http://configure.ac> <http://configure.ac>"
> >     >     by Fedora packagers, they just use the generated "configure"
> >     script
> >     >     as-is.
> >     >
> >     >     Regards,
> >     >     Daniel
> >     >     --
> >     >     |: https://berrange.com     -o-
> >     > https://www.flickr.com/photos/dberrange :|
> >     >     |: https://libvirt.org        -o-
> >     https://fstop138.berrange.com :|
> >     >     |: https://entangle-photo.org   -o-
> >     > https://www.instagram.com/dberrange :|
> >     >
> >
>
>

Reply via email to