Daniel P. Berrangé <berra...@redhat.com> wrote:
> On Tue, Sep 19, 2023 at 12:49:46PM -0400, Peter Xu wrote:
>> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
>> > Oh dear, where to start.  There's so much wrong, and in pretty obvious
>> > ways.  This code should never have passed review.  I'm refraining from
>> > saying more; see the commit messages instead.
>> > 
>> > Issues remaining after this series include:
>> > 
>> > * Terrible error messages
>> > 
>> > * Some error message cascades remain
>> > 
>> > * There is no written contract for QEMUFileHooks, and the
>> >   responsibility for reporting errors is unclear
>> 
>> Even being removed.. because no one is really extending that..
>> 
>> https://lore.kernel.org/all/20230509120700.78359-1-quint...@redhat.com/#t
>
> One day (in another 5-10 years) I still hope we'll get to
> the point where QEMUFile itself is obsolete :-)

If you see the patches on list, I have move rate limit check outside of
QEMUFile, so right now it is only a buffer to write in the main
migration thread.

> Getting
> rid of QEMUFileHooks is a great step in that direction.

Thanks.

> Me finishing a old PoC to bring buffering to QIOChannel
> would be another big step.

I want to get rid of qemu_file_set_error() and friends first.  We should
handle errors correctly when they happen, and go from there.

> The data rate limiting would be the biggest missing piece
> to enable migration/vmstate logic to directly consume
> a QIOChannel.

As said, that is done.  I have three atomic counters:

- qemu_file_bytes
- rdma_bytes
- multifd_bytes

We do the rate limiting adding that 3 counters.  The only thing that
QEMUFile does know is increase qemu_file_bytes when it needs to do it.

> Eliminating QEMUFile would help to bring Error **errp
> to all the vmstate codepaths.

Yes!

See the last problem on the list where they couldn't use
migrate_set_error() in vmstate.c because that breaks test-vmstate.c.

>> I always see rdma as "odd fixes" stage.. for a long time.  But maybe I was
>> wrong.
>
> In the MAINTAINERS file RDMA still get classified as formally
> supported under the migration maintainers.  I'm not convinced
> that is an accurate description of its status.  I tend to agree
> with you that it is 'odd fixes' at the very best.

It is not exact, we wanted to get rid of it.

> Dave Gilbert had previously speculated about whether we should
> even consider deprecating it on the basis that latest non-RDMA
> migration is too much better than in the past, with multifd
> and zerocopy, that RDMA might not even offer a significant
> enough peformance win to justify.

The main problem with RDMA is that it uses a really weird model for
migration point of view:
- everything there is asynchronous (nothing else is like that)
- it uses its own everything, i.e. abuses QEMUFile and QIOChannel to try
  to look like one, but it fails.
- Its error handling is "peculiar", to be friendly.  See this series
  from Markus to make it look normal.

Thanks, Juan.


Reply via email to