Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances

2023-09-25 Thread Lukas Straub
On Mon, 25 Sep 2023 09:20:58 -0300
Fabiano Rosas  wrote:

> CC: Daniel for the QIOChannel discussion
> 
> Lukas Straub  writes:
> > On Thu, 14 Sep 2023 10:57:47 -0400
> > Peter Xu  wrote:
> >  
> >> On Thu, Sep 14, 2023 at 10:23:38AM -0300, Fabiano Rosas wrote:  
> >> > Peter Xu  writes:
> >> > 
> >> > > On Wed, Sep 13, 2023 at 06:53:20PM -0300, Fabiano Rosas wrote:
> >> > >> Peter Xu  writes:
> >> > >> 
> >> > >> > On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote:
> >> > >> >> The core yank code is strict about balanced registering and
> >> > >> >> unregistering of yank functions.
> >> > >> >> 
> >> > >> >> This creates a difficulty because the migration code registers one
> >> > >> >> yank function per QIOChannel, but each QIOChannel can be 
> >> > >> >> referenced by
> >> > >> >> more than one QEMUFile. The yank function should not be removed 
> >> > >> >> until
> >> > >> >> all QEMUFiles have been closed.
> >> > >> >> 
> >> > >> >> Keep a reference count of how many QEMUFiles are using a QIOChannel
> >> > >> >> that has a yank function. Only unregister the yank function when 
> >> > >> >> all
> >> > >> >> QEMUFiles have been closed.
> >> > >> >> 
> >> > >> >> This improves the current code by removing the need for the 
> >> > >> >> programmer
> >> > >> >> to know which QEMUFile is the last one to be cleaned up and fixes 
> >> > >> >> the
> >> > >> >> theoretical issue of removing the yank function while another 
> >> > >> >> QEMUFile
> >> > >> >> could still be using the ioc and require a yank.
> >> > >> >> 
> >> > >> >> Signed-off-by: Fabiano Rosas 
> >> > >> >> ---
> >> > >> >>  migration/yank_functions.c | 81 
> >> > >> >> ++
> >> > >> >>  migration/yank_functions.h |  8 
> >> > >> >>  2 files changed, 81 insertions(+), 8 deletions(-)
> >> > >> >
> >> > >> > I worry this over-complicate things.
> >> > >> 
> >> > >> It does. We ran out of simple options.
> >> > >> 
> >> > >> > If you prefer the cleaness that we operate always on qemufile 
> >> > >> > level, can we
> >> > >> > just register each yank function per-qemufile?
> >> > >> 
> >> > >> "just" hehe
> >> > >> 
> >> > >> we could, but:
> >> > >> 
> >> > >> i) the yank is a per-channel operation, so this is even more 
> >> > >> unintuitive;
> >> > >
> >> > > I mean we can provide something like:
> >> > >
> >> > > void migration_yank_qemufile(void *opaque)
> >> > > {
> >> > > QEMUFile *file = opaque;
> >> > > QIOChannel *ioc = file->ioc;
> >> > >
> >> > > qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> >> > > }
> >> > >
> >> > > void migration_qemufile_register_yank(QEMUFile *file)
> >> > > {
> >> > > if (migration_ioc_yank_supported(file->ioc)) {
> >> > > yank_register_function(MIGRATION_YANK_INSTANCE,
> >> > >migration_yank_qemufile,
> >> > >file);
> >> > > }
> >> > > }
> >> > 
> >> > Sure, this is what I was thinking as well. IMO it will be yet another
> >> > operation that happens on the channel, but it performed via the
> >> > file. Just like qio_channel_close() at qemu_fclose(). Not the end of the
> >> > world, of course, I just find it error-prone.
> >> > 
> >> > >> 
> >> > >> ii) multifd doesn't have a QEMUFile, so it will have to continue using
> >> > >> the ioc;
> >> > >
> >> > > We can keep using migration_ioc_[un]register_yank() for them if 
> >> > > there's no
> >> > > qemufile attached.  As long as the function will all be registered 
> >> > > under
> >> > > MIGRATION_YANK_INSTANCE we should be fine having different yank func.
> >> > >
> >> > 
> >> > ok
> >> > 
> >> > >> 
> >> > >> iii) we'll have to add a yank to every new QEMUFile created during the
> >> > >>  incoming migration (colo, rdma, etc), otherwise the incoming side
> >> > >>  will be left using iocs while the src uses the QEMUFile;
> >> > >
> >> > > For RDMA, IIUC it'll simply be a noop as migration_ioc_yank_supported()
> >> > > will be a noop for it for either reg/unreg.
> >> > >
> >> > > Currently it seems we will also unreg the ioc even for RDMA (even 
> >> > > though we
> >> > > don't reg for it).  But since unreg will be a noop it seems all fine 
> >> > > even
> >> > > if not paired.. maybe we should still try to pair it, e.g. register 
> >> > > also in
> >> > > rdma_start_outgoing_migration() for the rdma ioc so at least they're 
> >> > > paired.
> >> > >
> >> > > I don't see why COLO is special here, though.  Maybe I missed 
> >> > > something.
> >> > 
> >> > For colo I was thinking we'd have to register the yank just to be sure
> >> > that all paths unregistering it have something to unregister.
> >> > 
> >> > Maybe I should move the register into qemu_file_new_impl() with a
> >> > matching unregister at qemu_fclose().
> >> 
> >> Sounds good.  Or...
> >>   
> >> > 
> >> > >> 
> >> > >> iv) this is a functional change of the 

Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances

2023-09-25 Thread Fabiano Rosas
CC: Daniel for the QIOChannel discussion

Lukas Straub  writes:
> On Thu, 14 Sep 2023 10:57:47 -0400
> Peter Xu  wrote:
>
>> On Thu, Sep 14, 2023 at 10:23:38AM -0300, Fabiano Rosas wrote:
>> > Peter Xu  writes:
>> >   
>> > > On Wed, Sep 13, 2023 at 06:53:20PM -0300, Fabiano Rosas wrote:  
>> > >> Peter Xu  writes:
>> > >>   
>> > >> > On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote:  
>> > >> >> The core yank code is strict about balanced registering and
>> > >> >> unregistering of yank functions.
>> > >> >> 
>> > >> >> This creates a difficulty because the migration code registers one
>> > >> >> yank function per QIOChannel, but each QIOChannel can be referenced 
>> > >> >> by
>> > >> >> more than one QEMUFile. The yank function should not be removed until
>> > >> >> all QEMUFiles have been closed.
>> > >> >> 
>> > >> >> Keep a reference count of how many QEMUFiles are using a QIOChannel
>> > >> >> that has a yank function. Only unregister the yank function when all
>> > >> >> QEMUFiles have been closed.
>> > >> >> 
>> > >> >> This improves the current code by removing the need for the 
>> > >> >> programmer
>> > >> >> to know which QEMUFile is the last one to be cleaned up and fixes the
>> > >> >> theoretical issue of removing the yank function while another 
>> > >> >> QEMUFile
>> > >> >> could still be using the ioc and require a yank.
>> > >> >> 
>> > >> >> Signed-off-by: Fabiano Rosas 
>> > >> >> ---
>> > >> >>  migration/yank_functions.c | 81 
>> > >> >> ++
>> > >> >>  migration/yank_functions.h |  8 
>> > >> >>  2 files changed, 81 insertions(+), 8 deletions(-)  
>> > >> >
>> > >> > I worry this over-complicate things.  
>> > >> 
>> > >> It does. We ran out of simple options.
>> > >>   
>> > >> > If you prefer the cleaness that we operate always on qemufile level, 
>> > >> > can we
>> > >> > just register each yank function per-qemufile?  
>> > >> 
>> > >> "just" hehe
>> > >> 
>> > >> we could, but:
>> > >> 
>> > >> i) the yank is a per-channel operation, so this is even more 
>> > >> unintuitive;  
>> > >
>> > > I mean we can provide something like:
>> > >
>> > > void migration_yank_qemufile(void *opaque)
>> > > {
>> > > QEMUFile *file = opaque;
>> > > QIOChannel *ioc = file->ioc;
>> > >
>> > > qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
>> > > }
>> > >
>> > > void migration_qemufile_register_yank(QEMUFile *file)
>> > > {
>> > > if (migration_ioc_yank_supported(file->ioc)) {
>> > > yank_register_function(MIGRATION_YANK_INSTANCE,
>> > >migration_yank_qemufile,
>> > >file);
>> > > }
>> > > }  
>> > 
>> > Sure, this is what I was thinking as well. IMO it will be yet another
>> > operation that happens on the channel, but it performed via the
>> > file. Just like qio_channel_close() at qemu_fclose(). Not the end of the
>> > world, of course, I just find it error-prone.
>> >   
>> > >> 
>> > >> ii) multifd doesn't have a QEMUFile, so it will have to continue using
>> > >> the ioc;  
>> > >
>> > > We can keep using migration_ioc_[un]register_yank() for them if there's 
>> > > no
>> > > qemufile attached.  As long as the function will all be registered under
>> > > MIGRATION_YANK_INSTANCE we should be fine having different yank func.
>> > >  
>> > 
>> > ok
>> >   
>> > >> 
>> > >> iii) we'll have to add a yank to every new QEMUFile created during the
>> > >>  incoming migration (colo, rdma, etc), otherwise the incoming side
>> > >>  will be left using iocs while the src uses the QEMUFile;  
>> > >
>> > > For RDMA, IIUC it'll simply be a noop as migration_ioc_yank_supported()
>> > > will be a noop for it for either reg/unreg.
>> > >
>> > > Currently it seems we will also unreg the ioc even for RDMA (even though 
>> > > we
>> > > don't reg for it).  But since unreg will be a noop it seems all fine even
>> > > if not paired.. maybe we should still try to pair it, e.g. register also 
>> > > in
>> > > rdma_start_outgoing_migration() for the rdma ioc so at least they're 
>> > > paired.
>> > >
>> > > I don't see why COLO is special here, though.  Maybe I missed something. 
>> > >  
>> > 
>> > For colo I was thinking we'd have to register the yank just to be sure
>> > that all paths unregistering it have something to unregister.
>> > 
>> > Maybe I should move the register into qemu_file_new_impl() with a
>> > matching unregister at qemu_fclose().  
>> 
>> Sounds good.  Or...
>> 
>> >   
>> > >> 
>> > >> iv) this is a functional change of the yank feature for which we have no
>> > >> tests.  
>> > >
>> > > Having yank tested should be preferrable.  Lukas is in the loop, let's 
>> > > see
>> > > whether he has something. We can still smoke test it before a selftest
>> > > being there.
>> > >
>
> Hi All,
> Sorry for the late reply.
>
> Yes, testing missing. I'll work on it.
>
>> > > Taking one step back.. I doubt whether anyone is using 

Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances

2023-09-25 Thread Lukas Straub
On Thu, 14 Sep 2023 10:57:47 -0400
Peter Xu  wrote:

> On Thu, Sep 14, 2023 at 10:23:38AM -0300, Fabiano Rosas wrote:
> > Peter Xu  writes:
> >   
> > > On Wed, Sep 13, 2023 at 06:53:20PM -0300, Fabiano Rosas wrote:  
> > >> Peter Xu  writes:
> > >>   
> > >> > On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote:  
> > >> >> The core yank code is strict about balanced registering and
> > >> >> unregistering of yank functions.
> > >> >> 
> > >> >> This creates a difficulty because the migration code registers one
> > >> >> yank function per QIOChannel, but each QIOChannel can be referenced by
> > >> >> more than one QEMUFile. The yank function should not be removed until
> > >> >> all QEMUFiles have been closed.
> > >> >> 
> > >> >> Keep a reference count of how many QEMUFiles are using a QIOChannel
> > >> >> that has a yank function. Only unregister the yank function when all
> > >> >> QEMUFiles have been closed.
> > >> >> 
> > >> >> This improves the current code by removing the need for the programmer
> > >> >> to know which QEMUFile is the last one to be cleaned up and fixes the
> > >> >> theoretical issue of removing the yank function while another QEMUFile
> > >> >> could still be using the ioc and require a yank.
> > >> >> 
> > >> >> Signed-off-by: Fabiano Rosas 
> > >> >> ---
> > >> >>  migration/yank_functions.c | 81 
> > >> >> ++
> > >> >>  migration/yank_functions.h |  8 
> > >> >>  2 files changed, 81 insertions(+), 8 deletions(-)  
> > >> >
> > >> > I worry this over-complicate things.  
> > >> 
> > >> It does. We ran out of simple options.
> > >>   
> > >> > If you prefer the cleaness that we operate always on qemufile level, 
> > >> > can we
> > >> > just register each yank function per-qemufile?  
> > >> 
> > >> "just" hehe
> > >> 
> > >> we could, but:
> > >> 
> > >> i) the yank is a per-channel operation, so this is even more 
> > >> unintuitive;  
> > >
> > > I mean we can provide something like:
> > >
> > > void migration_yank_qemufile(void *opaque)
> > > {
> > > QEMUFile *file = opaque;
> > > QIOChannel *ioc = file->ioc;
> > >
> > > qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> > > }
> > >
> > > void migration_qemufile_register_yank(QEMUFile *file)
> > > {
> > > if (migration_ioc_yank_supported(file->ioc)) {
> > > yank_register_function(MIGRATION_YANK_INSTANCE,
> > >migration_yank_qemufile,
> > >file);
> > > }
> > > }  
> > 
> > Sure, this is what I was thinking as well. IMO it will be yet another
> > operation that happens on the channel, but it performed via the
> > file. Just like qio_channel_close() at qemu_fclose(). Not the end of the
> > world, of course, I just find it error-prone.
> >   
> > >> 
> > >> ii) multifd doesn't have a QEMUFile, so it will have to continue using
> > >> the ioc;  
> > >
> > > We can keep using migration_ioc_[un]register_yank() for them if there's no
> > > qemufile attached.  As long as the function will all be registered under
> > > MIGRATION_YANK_INSTANCE we should be fine having different yank func.
> > >  
> > 
> > ok
> >   
> > >> 
> > >> iii) we'll have to add a yank to every new QEMUFile created during the
> > >>  incoming migration (colo, rdma, etc), otherwise the incoming side
> > >>  will be left using iocs while the src uses the QEMUFile;  
> > >
> > > For RDMA, IIUC it'll simply be a noop as migration_ioc_yank_supported()
> > > will be a noop for it for either reg/unreg.
> > >
> > > Currently it seems we will also unreg the ioc even for RDMA (even though 
> > > we
> > > don't reg for it).  But since unreg will be a noop it seems all fine even
> > > if not paired.. maybe we should still try to pair it, e.g. register also 
> > > in
> > > rdma_start_outgoing_migration() for the rdma ioc so at least they're 
> > > paired.
> > >
> > > I don't see why COLO is special here, though.  Maybe I missed something.  
> > 
> > For colo I was thinking we'd have to register the yank just to be sure
> > that all paths unregistering it have something to unregister.
> > 
> > Maybe I should move the register into qemu_file_new_impl() with a
> > matching unregister at qemu_fclose().  
> 
> Sounds good.  Or...
> 
> >   
> > >> 
> > >> iv) this is a functional change of the yank feature for which we have no
> > >> tests.  
> > >
> > > Having yank tested should be preferrable.  Lukas is in the loop, let's see
> > > whether he has something. We can still smoke test it before a selftest
> > > being there.
> > >

Hi All,
Sorry for the late reply.

Yes, testing missing. I'll work on it.

> > > Taking one step back.. I doubt whether anyone is using yank for migration?
> > > Knowing that migration already have migrate-cancel (for precopy) and
> > > migrate-pause (for postcopy).  
> > 
> > Right, both already call qio_channel_shutdown().
> >   
> > > I never used it myself, and I don't think
> > > it's 

Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances

2023-09-14 Thread Peter Xu
On Thu, Sep 14, 2023 at 10:23:38AM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Wed, Sep 13, 2023 at 06:53:20PM -0300, Fabiano Rosas wrote:
> >> Peter Xu  writes:
> >> 
> >> > On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote:
> >> >> The core yank code is strict about balanced registering and
> >> >> unregistering of yank functions.
> >> >> 
> >> >> This creates a difficulty because the migration code registers one
> >> >> yank function per QIOChannel, but each QIOChannel can be referenced by
> >> >> more than one QEMUFile. The yank function should not be removed until
> >> >> all QEMUFiles have been closed.
> >> >> 
> >> >> Keep a reference count of how many QEMUFiles are using a QIOChannel
> >> >> that has a yank function. Only unregister the yank function when all
> >> >> QEMUFiles have been closed.
> >> >> 
> >> >> This improves the current code by removing the need for the programmer
> >> >> to know which QEMUFile is the last one to be cleaned up and fixes the
> >> >> theoretical issue of removing the yank function while another QEMUFile
> >> >> could still be using the ioc and require a yank.
> >> >> 
> >> >> Signed-off-by: Fabiano Rosas 
> >> >> ---
> >> >>  migration/yank_functions.c | 81 ++
> >> >>  migration/yank_functions.h |  8 
> >> >>  2 files changed, 81 insertions(+), 8 deletions(-)
> >> >
> >> > I worry this over-complicate things.
> >> 
> >> It does. We ran out of simple options.
> >> 
> >> > If you prefer the cleaness that we operate always on qemufile level, can 
> >> > we
> >> > just register each yank function per-qemufile?
> >> 
> >> "just" hehe
> >> 
> >> we could, but:
> >> 
> >> i) the yank is a per-channel operation, so this is even more unintuitive;
> >
> > I mean we can provide something like:
> >
> > void migration_yank_qemufile(void *opaque)
> > {
> > QEMUFile *file = opaque;
> > QIOChannel *ioc = file->ioc;
> >
> > qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> > }
> >
> > void migration_qemufile_register_yank(QEMUFile *file)
> > {
> > if (migration_ioc_yank_supported(file->ioc)) {
> > yank_register_function(MIGRATION_YANK_INSTANCE,
> >migration_yank_qemufile,
> >file);
> > }
> > }
> 
> Sure, this is what I was thinking as well. IMO it will be yet another
> operation that happens on the channel, but it performed via the
> file. Just like qio_channel_close() at qemu_fclose(). Not the end of the
> world, of course, I just find it error-prone.
> 
> >> 
> >> ii) multifd doesn't have a QEMUFile, so it will have to continue using
> >> the ioc;
> >
> > We can keep using migration_ioc_[un]register_yank() for them if there's no
> > qemufile attached.  As long as the function will all be registered under
> > MIGRATION_YANK_INSTANCE we should be fine having different yank func.
> >
> 
> ok
> 
> >> 
> >> iii) we'll have to add a yank to every new QEMUFile created during the
> >>  incoming migration (colo, rdma, etc), otherwise the incoming side
> >>  will be left using iocs while the src uses the QEMUFile;
> >
> > For RDMA, IIUC it'll simply be a noop as migration_ioc_yank_supported()
> > will be a noop for it for either reg/unreg.
> >
> > Currently it seems we will also unreg the ioc even for RDMA (even though we
> > don't reg for it).  But since unreg will be a noop it seems all fine even
> > if not paired.. maybe we should still try to pair it, e.g. register also in
> > rdma_start_outgoing_migration() for the rdma ioc so at least they're paired.
> >
> > I don't see why COLO is special here, though.  Maybe I missed something.
> 
> For colo I was thinking we'd have to register the yank just to be sure
> that all paths unregistering it have something to unregister.
> 
> Maybe I should move the register into qemu_file_new_impl() with a
> matching unregister at qemu_fclose().

Sounds good.  Or...

> 
> >> 
> >> iv) this is a functional change of the yank feature for which we have no
> >> tests.
> >
> > Having yank tested should be preferrable.  Lukas is in the loop, let's see
> > whether he has something. We can still smoke test it before a selftest
> > being there.
> >
> > Taking one step back.. I doubt whether anyone is using yank for migration?
> > Knowing that migration already have migrate-cancel (for precopy) and
> > migrate-pause (for postcopy).
> 
> Right, both already call qio_channel_shutdown().
> 
> > I never used it myself, and I don't think
> > it's supported for RHEL.  How's that in suse's case?
> 
> Never heard mention of it and I don't see it in our virtualization
> documentation.
> 
> >
> > If no one is using it, maybe we can even avoid registering migration to
> > yank?
> >
> 
> Seems reasonable to me.

... let's wait for a few days from Lukas to see whether he as any more
input, or I'd vote for dropping yank for migration as a whole. It caused
mostly more crashes that I knew than 

Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances

2023-09-14 Thread Fabiano Rosas
Peter Xu  writes:

> On Wed, Sep 13, 2023 at 06:53:20PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote:
>> >> The core yank code is strict about balanced registering and
>> >> unregistering of yank functions.
>> >> 
>> >> This creates a difficulty because the migration code registers one
>> >> yank function per QIOChannel, but each QIOChannel can be referenced by
>> >> more than one QEMUFile. The yank function should not be removed until
>> >> all QEMUFiles have been closed.
>> >> 
>> >> Keep a reference count of how many QEMUFiles are using a QIOChannel
>> >> that has a yank function. Only unregister the yank function when all
>> >> QEMUFiles have been closed.
>> >> 
>> >> This improves the current code by removing the need for the programmer
>> >> to know which QEMUFile is the last one to be cleaned up and fixes the
>> >> theoretical issue of removing the yank function while another QEMUFile
>> >> could still be using the ioc and require a yank.
>> >> 
>> >> Signed-off-by: Fabiano Rosas 
>> >> ---
>> >>  migration/yank_functions.c | 81 ++
>> >>  migration/yank_functions.h |  8 
>> >>  2 files changed, 81 insertions(+), 8 deletions(-)
>> >
>> > I worry this over-complicate things.
>> 
>> It does. We ran out of simple options.
>> 
>> > If you prefer the cleaness that we operate always on qemufile level, can we
>> > just register each yank function per-qemufile?
>> 
>> "just" hehe
>> 
>> we could, but:
>> 
>> i) the yank is a per-channel operation, so this is even more unintuitive;
>
> I mean we can provide something like:
>
> void migration_yank_qemufile(void *opaque)
> {
> QEMUFile *file = opaque;
> QIOChannel *ioc = file->ioc;
>
> qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> }
>
> void migration_qemufile_register_yank(QEMUFile *file)
> {
> if (migration_ioc_yank_supported(file->ioc)) {
> yank_register_function(MIGRATION_YANK_INSTANCE,
>migration_yank_qemufile,
>file);
> }
> }

Sure, this is what I was thinking as well. IMO it will be yet another
operation that happens on the channel, but it performed via the
file. Just like qio_channel_close() at qemu_fclose(). Not the end of the
world, of course, I just find it error-prone.

>> 
>> ii) multifd doesn't have a QEMUFile, so it will have to continue using
>> the ioc;
>
> We can keep using migration_ioc_[un]register_yank() for them if there's no
> qemufile attached.  As long as the function will all be registered under
> MIGRATION_YANK_INSTANCE we should be fine having different yank func.
>

ok

>> 
>> iii) we'll have to add a yank to every new QEMUFile created during the
>>  incoming migration (colo, rdma, etc), otherwise the incoming side
>>  will be left using iocs while the src uses the QEMUFile;
>
> For RDMA, IIUC it'll simply be a noop as migration_ioc_yank_supported()
> will be a noop for it for either reg/unreg.
>
> Currently it seems we will also unreg the ioc even for RDMA (even though we
> don't reg for it).  But since unreg will be a noop it seems all fine even
> if not paired.. maybe we should still try to pair it, e.g. register also in
> rdma_start_outgoing_migration() for the rdma ioc so at least they're paired.
>
> I don't see why COLO is special here, though.  Maybe I missed something.

For colo I was thinking we'd have to register the yank just to be sure
that all paths unregistering it have something to unregister.

Maybe I should move the register into qemu_file_new_impl() with a
matching unregister at qemu_fclose().

>> 
>> iv) this is a functional change of the yank feature for which we have no
>> tests.
>
> Having yank tested should be preferrable.  Lukas is in the loop, let's see
> whether he has something. We can still smoke test it before a selftest
> being there.
>
> Taking one step back.. I doubt whether anyone is using yank for migration?
> Knowing that migration already have migrate-cancel (for precopy) and
> migrate-pause (for postcopy).

Right, both already call qio_channel_shutdown().

> I never used it myself, and I don't think
> it's supported for RHEL.  How's that in suse's case?

Never heard mention of it and I don't see it in our virtualization
documentation.

>
> If no one is using it, maybe we can even avoid registering migration to
> yank?
>

Seems reasonable to me.

>> 
>> If that's all ok to you I'll go ahead and git it a try.
>> 
>> > I think qmp yank will simply fail the 2nd call on the qemufile if the
>> > iochannel is shared with the other one, but that's totally fine, IMHO.
>> >
>> > What do you think?
>> >
>> > In all cases, we should probably at least merge patch 1-8 if that can
>> > resolve the CI issue.  I think all of them are properly reviewed.
>> 
>> I agree. Someone needs to queue this though since Juan has been busy.
>
> Yes, I'll see what I can do.

Thanks. I could 

Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances

2023-09-13 Thread Peter Xu
On Wed, Sep 13, 2023 at 06:53:20PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote:
> >> The core yank code is strict about balanced registering and
> >> unregistering of yank functions.
> >> 
> >> This creates a difficulty because the migration code registers one
> >> yank function per QIOChannel, but each QIOChannel can be referenced by
> >> more than one QEMUFile. The yank function should not be removed until
> >> all QEMUFiles have been closed.
> >> 
> >> Keep a reference count of how many QEMUFiles are using a QIOChannel
> >> that has a yank function. Only unregister the yank function when all
> >> QEMUFiles have been closed.
> >> 
> >> This improves the current code by removing the need for the programmer
> >> to know which QEMUFile is the last one to be cleaned up and fixes the
> >> theoretical issue of removing the yank function while another QEMUFile
> >> could still be using the ioc and require a yank.
> >> 
> >> Signed-off-by: Fabiano Rosas 
> >> ---
> >>  migration/yank_functions.c | 81 ++
> >>  migration/yank_functions.h |  8 
> >>  2 files changed, 81 insertions(+), 8 deletions(-)
> >
> > I worry this over-complicate things.
> 
> It does. We ran out of simple options.
> 
> > If you prefer the cleaness that we operate always on qemufile level, can we
> > just register each yank function per-qemufile?
> 
> "just" hehe
> 
> we could, but:
> 
> i) the yank is a per-channel operation, so this is even more unintuitive;

I mean we can provide something like:

void migration_yank_qemufile(void *opaque)
{
QEMUFile *file = opaque;
QIOChannel *ioc = file->ioc;

qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
}

void migration_qemufile_register_yank(QEMUFile *file)
{
if (migration_ioc_yank_supported(file->ioc)) {
yank_register_function(MIGRATION_YANK_INSTANCE,
   migration_yank_qemufile,
   file);
}
}

> 
> ii) multifd doesn't have a QEMUFile, so it will have to continue using
> the ioc;

We can keep using migration_ioc_[un]register_yank() for them if there's no
qemufile attached.  As long as the function will all be registered under
MIGRATION_YANK_INSTANCE we should be fine having different yank func.

> 
> iii) we'll have to add a yank to every new QEMUFile created during the
>  incoming migration (colo, rdma, etc), otherwise the incoming side
>  will be left using iocs while the src uses the QEMUFile;

For RDMA, IIUC it'll simply be a noop as migration_ioc_yank_supported()
will be a noop for it for either reg/unreg.

Currently it seems we will also unreg the ioc even for RDMA (even though we
don't reg for it).  But since unreg will be a noop it seems all fine even
if not paired.. maybe we should still try to pair it, e.g. register also in
rdma_start_outgoing_migration() for the rdma ioc so at least they're paired.

I don't see why COLO is special here, though.  Maybe I missed something.

> 
> iv) this is a functional change of the yank feature for which we have no
> tests.

Having yank tested should be preferrable.  Lukas is in the loop, let's see
whether he has something. We can still smoke test it before a selftest
being there.

Taking one step back.. I doubt whether anyone is using yank for migration?
Knowing that migration already have migrate-cancel (for precopy) and
migrate-pause (for postcopy).  I never used it myself, and I don't think
it's supported for RHEL.  How's that in suse's case?

If no one is using it, maybe we can even avoid registering migration to
yank?

> 
> If that's all ok to you I'll go ahead and git it a try.
> 
> > I think qmp yank will simply fail the 2nd call on the qemufile if the
> > iochannel is shared with the other one, but that's totally fine, IMHO.
> >
> > What do you think?
> >
> > In all cases, we should probably at least merge patch 1-8 if that can
> > resolve the CI issue.  I think all of them are properly reviewed.
> 
> I agree. Someone needs to queue this though since Juan has been busy.

Yes, I'll see what I can do.

-- 
Peter Xu




Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances

2023-09-13 Thread Fabiano Rosas
Peter Xu  writes:

> On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote:
>> The core yank code is strict about balanced registering and
>> unregistering of yank functions.
>> 
>> This creates a difficulty because the migration code registers one
>> yank function per QIOChannel, but each QIOChannel can be referenced by
>> more than one QEMUFile. The yank function should not be removed until
>> all QEMUFiles have been closed.
>> 
>> Keep a reference count of how many QEMUFiles are using a QIOChannel
>> that has a yank function. Only unregister the yank function when all
>> QEMUFiles have been closed.
>> 
>> This improves the current code by removing the need for the programmer
>> to know which QEMUFile is the last one to be cleaned up and fixes the
>> theoretical issue of removing the yank function while another QEMUFile
>> could still be using the ioc and require a yank.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  migration/yank_functions.c | 81 ++
>>  migration/yank_functions.h |  8 
>>  2 files changed, 81 insertions(+), 8 deletions(-)
>
> I worry this over-complicate things.

It does. We ran out of simple options.

> If you prefer the cleaness that we operate always on qemufile level, can we
> just register each yank function per-qemufile?

"just" hehe

we could, but:

i) the yank is a per-channel operation, so this is even more unintuitive;

ii) multifd doesn't have a QEMUFile, so it will have to continue using
the ioc;

iii) we'll have to add a yank to every new QEMUFile created during the
 incoming migration (colo, rdma, etc), otherwise the incoming side
 will be left using iocs while the src uses the QEMUFile;

iv) this is a functional change of the yank feature for which we have no
tests.

If that's all ok to you I'll go ahead and git it a try.

> I think qmp yank will simply fail the 2nd call on the qemufile if the
> iochannel is shared with the other one, but that's totally fine, IMHO.
>
> What do you think?
>
> In all cases, we should probably at least merge patch 1-8 if that can
> resolve the CI issue.  I think all of them are properly reviewed.

I agree. Someone needs to queue this though since Juan has been busy.



Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances

2023-09-13 Thread Peter Xu
On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote:
> The core yank code is strict about balanced registering and
> unregistering of yank functions.
> 
> This creates a difficulty because the migration code registers one
> yank function per QIOChannel, but each QIOChannel can be referenced by
> more than one QEMUFile. The yank function should not be removed until
> all QEMUFiles have been closed.
> 
> Keep a reference count of how many QEMUFiles are using a QIOChannel
> that has a yank function. Only unregister the yank function when all
> QEMUFiles have been closed.
> 
> This improves the current code by removing the need for the programmer
> to know which QEMUFile is the last one to be cleaned up and fixes the
> theoretical issue of removing the yank function while another QEMUFile
> could still be using the ioc and require a yank.
> 
> Signed-off-by: Fabiano Rosas 
> ---
>  migration/yank_functions.c | 81 ++
>  migration/yank_functions.h |  8 
>  2 files changed, 81 insertions(+), 8 deletions(-)

I worry this over-complicate things.

If you prefer the cleaness that we operate always on qemufile level, can we
just register each yank function per-qemufile?  I think qmp yank will
simply fail the 2nd call on the qemufile if the iochannel is shared with
the other one, but that's totally fine, IMHO.

What do you think?

In all cases, we should probably at least merge patch 1-8 if that can
resolve the CI issue.  I think all of them are properly reviewed.

Thanks,

-- 
Peter Xu




[PATCH v6 09/10] migration/yank: Keep track of registered yank instances

2023-09-11 Thread Fabiano Rosas
The core yank code is strict about balanced registering and
unregistering of yank functions.

This creates a difficulty because the migration code registers one
yank function per QIOChannel, but each QIOChannel can be referenced by
more than one QEMUFile. The yank function should not be removed until
all QEMUFiles have been closed.

Keep a reference count of how many QEMUFiles are using a QIOChannel
that has a yank function. Only unregister the yank function when all
QEMUFiles have been closed.

This improves the current code by removing the need for the programmer
to know which QEMUFile is the last one to be cleaned up and fixes the
theoretical issue of removing the yank function while another QEMUFile
could still be using the ioc and require a yank.

Signed-off-by: Fabiano Rosas 
---
 migration/yank_functions.c | 81 ++
 migration/yank_functions.h |  8 
 2 files changed, 81 insertions(+), 8 deletions(-)

diff --git a/migration/yank_functions.c b/migration/yank_functions.c
index 979e60c762..afdeef30ff 100644
--- a/migration/yank_functions.c
+++ b/migration/yank_functions.c
@@ -10,9 +10,32 @@
 #include "qemu/osdep.h"
 #include "io/channel.h"
 #include "yank_functions.h"
+#include "qemu/lockable.h"
 #include "qemu/yank.h"
 #include "qemu-file.h"
 
+static QemuMutex ioc_list_lock;
+static QLIST_HEAD(, Yankable) yankable_ioc_list
+= QLIST_HEAD_INITIALIZER(yankable_ioc_list);
+
+static void __attribute__((constructor)) ioc_list_lock_init(void)
+{
+qemu_mutex_init(_list_lock);
+}
+
+static void yankable_ref(Yankable *yankable)
+{
+assert(yankable->refcnt > 0);
+yankable->refcnt++;
+assert(yankable->refcnt < INT_MAX);
+}
+
+static void yankable_unref(Yankable *yankable)
+{
+assert(yankable->refcnt > 0);
+yankable->refcnt--;
+}
+
 void migration_yank_iochannel(void *opaque)
 {
 QIOChannel *ioc = QIO_CHANNEL(opaque);
@@ -28,20 +51,62 @@ static bool migration_ioc_yank_supported(QIOChannel *ioc)
 
 void migration_ioc_register_yank(QIOChannel *ioc)
 {
-if (migration_ioc_yank_supported(ioc)) {
-yank_register_function(MIGRATION_YANK_INSTANCE,
-   migration_yank_iochannel,
-   ioc);
+Yankable *new, *entry;
+
+if (!ioc || !migration_ioc_yank_supported(ioc)) {
+return;
 }
+
+WITH_QEMU_LOCK_GUARD(_list_lock) {
+QLIST_FOREACH(entry, _ioc_list, next) {
+if (entry->opaque == ioc) {
+yankable_ref(entry);
+return;
+}
+}
+
+new = g_new0(Yankable, 1);
+new->refcnt = 1;
+new->opaque = ioc;
+object_ref(ioc);
+
+QLIST_INSERT_HEAD(_ioc_list, new, next);
+}
+
+yank_register_function(MIGRATION_YANK_INSTANCE,
+   migration_yank_iochannel,
+   ioc);
 }
 
 void migration_ioc_unregister_yank(QIOChannel *ioc)
 {
-if (migration_ioc_yank_supported(ioc)) {
-yank_unregister_function(MIGRATION_YANK_INSTANCE,
- migration_yank_iochannel,
- ioc);
+Yankable *entry, *tmp;
+
+if (!ioc || !migration_ioc_yank_supported(ioc)) {
+return;
 }
+
+WITH_QEMU_LOCK_GUARD(_list_lock) {
+QLIST_FOREACH_SAFE(entry, _ioc_list, next, tmp) {
+if (entry->opaque == ioc) {
+yankable_unref(entry);
+
+if (!entry->refcnt) {
+QLIST_REMOVE(entry, next);
+g_free(entry);
+goto unreg;
+}
+}
+}
+}
+
+return;
+
+unreg:
+yank_unregister_function(MIGRATION_YANK_INSTANCE,
+ migration_yank_iochannel,
+ ioc);
+object_unref(ioc);
 }
 
 void migration_ioc_unregister_yank_from_file(QEMUFile *file)
diff --git a/migration/yank_functions.h b/migration/yank_functions.h
index a7577955ed..c928a46f68 100644
--- a/migration/yank_functions.h
+++ b/migration/yank_functions.h
@@ -7,6 +7,14 @@
  * See the COPYING file in the top-level directory.
  */
 
+struct Yankable {
+void *opaque;
+int refcnt;
+QLIST_ENTRY(Yankable) next;
+};
+
+typedef struct Yankable Yankable;
+
 /**
  * migration_yank_iochannel: yank function for iochannel
  *
-- 
2.35.3