Daniel P. Berrangé <berra...@redhat.com> wrote: > On Tue, Sep 19, 2023 at 12:49:46PM -0400, Peter Xu wrote: >> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote: >> > Oh dear, where to start. There's so much wrong, and in pretty obvious >> > ways. This code should never have passed review. I'm refraining from >> > saying more; see the commit messages instead. >> > >> > Issues remaining after this series include: >> > >> > * Terrible error messages >> > >> > * Some error message cascades remain >> > >> > * There is no written contract for QEMUFileHooks, and the >> > responsibility for reporting errors is unclear >> >> Even being removed.. because no one is really extending that.. >> >> https://lore.kernel.org/all/20230509120700.78359-1-quint...@redhat.com/#t > > One day (in another 5-10 years) I still hope we'll get to > the point where QEMUFile itself is obsolete :-)
If you see the patches on list, I have move rate limit check outside of QEMUFile, so right now it is only a buffer to write in the main migration thread. > Getting > rid of QEMUFileHooks is a great step in that direction. Thanks. > Me finishing a old PoC to bring buffering to QIOChannel > would be another big step. I want to get rid of qemu_file_set_error() and friends first. We should handle errors correctly when they happen, and go from there. > The data rate limiting would be the biggest missing piece > to enable migration/vmstate logic to directly consume > a QIOChannel. As said, that is done. I have three atomic counters: - qemu_file_bytes - rdma_bytes - multifd_bytes We do the rate limiting adding that 3 counters. The only thing that QEMUFile does know is increase qemu_file_bytes when it needs to do it. > Eliminating QEMUFile would help to bring Error **errp > to all the vmstate codepaths. Yes! See the last problem on the list where they couldn't use migrate_set_error() in vmstate.c because that breaks test-vmstate.c. >> I always see rdma as "odd fixes" stage.. for a long time. But maybe I was >> wrong. > > In the MAINTAINERS file RDMA still get classified as formally > supported under the migration maintainers. I'm not convinced > that is an accurate description of its status. I tend to agree > with you that it is 'odd fixes' at the very best. It is not exact, we wanted to get rid of it. > Dave Gilbert had previously speculated about whether we should > even consider deprecating it on the basis that latest non-RDMA > migration is too much better than in the past, with multifd > and zerocopy, that RDMA might not even offer a significant > enough peformance win to justify. The main problem with RDMA is that it uses a really weird model for migration point of view: - everything there is asynchronous (nothing else is like that) - it uses its own everything, i.e. abuses QEMUFile and QIOChannel to try to look like one, but it fails. - Its error handling is "peculiar", to be friendly. See this series from Markus to make it look normal. Thanks, Juan.