On Tue, Feb 28, 2023 at 8:05 PM Daniel P. Berrangé <berra...@redhat.com> wrote:
> On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote: > > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé <berra...@redhat.com > > > > wrote: > > > > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote: > > > > Added a function to check the stamp in the helper. > > > > eBPF helper should have a special symbol that generates during the > build. > > > > QEMU checks the helper and determines that it fits, so the helper > > > > will produce proper output. > > > > > > I think this is quite limiting for in place upgrades. > > > > > > Consider this scenario > > > > > > * Host has QEMU 8.1.0 installed > > > * VM is running QEMU 8.1.0 > > > * QEMU 8.1.1 is released with a bug fix in the EBF program > > > * Host is upgraded to QEMU 8.1.1 > > > * User attempts to hotplug a NIC to the running VM > > > > > > IIUC this last step is going to fail because we'll be loading > > > the EBF program from 8.1.1 and so its hash is different from > > > that expected by the QEMU 8.1.0 that the pre-existing VM is > > > running. > > > > > > Indeed we did not take in account the in-place upgrade. > > > > > > > > > If some changes to the EBF program are not going to be back > > > compatible from the POV of the QEMU process, should we instead > > > be versioning the EBF program. eg so new QEMU will ship both > > > the old and new versions of the EBF program. > > > > This does not seem to be an elegant option: QEMU theoretically can > include > > different eBPF programs but it hardly can interface with each one of > them. > > The code of QEMU (access to eBPF maps etc) includes header files which > eBPF > > of the day is being built with them. > > > > I see 2 options to address this issue (of course there are more) > > 1. Build and install qemu-rss-helper-<hash> executable. Libvirt will > always > > have a correct name, so for the running instance it will use > > qemu-rss-helper-<old-hash>, for the new instance it will use > > qemu-rss-helper-<new-hash> > > We'll get an ever growing number of program variants we need to > build & distribute with each new QEMU release. > New release of the qemu-rss-helper-<new-hash> will be created in fact only when the eBPF binary is updated. This does not happen on each release. But yes, this looks like versioning of all the shared libraries. > > > 2. Build the helper executable and link it inside qemu as a blob. Libvirt > > will always retrieve the executable to the temporary file name and use > it. > > So the retrieved helper will always be compatible with QEMU. I'm not sure > > what is the most portable way to do that. > > QEMU is considered an untrusted process, so there's no way we're going > to ask it to give us an ELF binary and then execute that in privileged > context. > > > Does one of these seem suitable? > > Neither feels very appealing to me. > > I've been trying to understand the eBPF code we're dealing with in a > little more detail. > > IIUC, QEMU, or rather the virtio-net driver needs to receive one FD > for the BPF program, and one or more FDs for the BPF maps that the > program uses. Currently it uses 3 maps, so needs 3 map FDs on top of > the program FD. > > The helper program that is proposed here calls ebpf_rss_load() to > load the program and get back a struct which gives access to the > 4 FDs, which are then sent to the mgmt app, which forwards them > onto QEMU. > > The ebpf_rss_load() method is making use of various structs that > are specific to the RSS program implementation, but does not seems > to do anything especially interesting. It calls into rss_bpf__open() > which eventually gets around to calling rss_bpf__create_skeleton > which is where the interesting stuff happens. > > This rss_bpf__create_skeleton() method is implemented in terms of > totally generic libbpf APIs, and has the actual blob that is the > BPF program. > > Looking at what this does, I feel it should be trivial for a mgmt > app to implement equivalent logic to rss_bpf__create_skeleton in a > generic manner, if we could just expose the program blob and the > map names to the mgmt app. eg a simple json file > > { > "maps": [ > "tap_rss_map_configurations", > "tap_rss_map_indirection_table", > "tap_rss_map_toeplitz_key", > ], > "program": "....the big blob encoded in base64..." > } > > if we installed that file are /usr/share/qemu/bpf/net-rss.json > then when a QEMU process is started, the mgmt app capture the > data in this JSON file. It now has enough info to create the > EBPF programs needed and pass the FDs over to QEMU. This would > be robust against QEMU software upgrades, and not tied to the > specific EBPF program imlp. We can add or remove maps / change > their names etc any time, as the details in the JSON descriptor > can be updated. This avoids need for any special helper program > to be provided by QEMU with the problems that is throwing up > for us. > If I understand correctly, the libvirt will have the same problem in the in-place update scenario with the JSON file Let's say that there is no virtio-net device at the initial start and the libvirt does not need to load the JSON file. On the later hot plug of virtio-net it will go to the JSON file after the update, correct? > > What am I missing ? This seems pretty straightforward to > achieve from what I see. > > With regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- > https://www.instagram.com/dberrange :| > >