On Fri, Sep 26, 2025 at 02:39:43AM +0000, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 28/08/2025 04:59, Peter Xu wrote:
> > The old RDMA's io_create_watch() isn't really doing much work anyway.  For
> > G_IO_OUT, it already does return immediately.  For G_IO_IN, it will try to
> > detect some RDMA context length however normally nobody will be able to set
> > it at all.
> > 
> 
> 
> First, RDMA migration works well with this patch applied.
> 
> Tested-by: Li Zhijian <[email protected]>

Thanks a lot, Zhijian.

> 
> 
> I have a small question. While testing, I didn't observe any callers to
> qio_channel_rdma_create_watch() during a complete RDMA migration using
> the default capabilities and parameters.
> I was wondering in which case this function is expected to be called?
> (I see io_create_watch() is mandatory for QIOChannelClass)

Yes, that's also my observation.  See my reply to Fabiano on the same patch
for some information.

A summary of what I said there but more focused to what you're asking: IIUC
currently we almost always rely on qemu_rdma_wait_comp_channel() to poll
the two rdma fds, and yield if necessary when in a coroutine.

IOW, I don't know when qio_channel_rdma_create_watch(), or in most cases,
qio_channel_wait(), will be used at all.  I had a feeling that if it's used
it might stuck forever (as the gsource will be monitoring control_len, see
below [1], while IIUC only the thread itself can update it, or am I
wrong?).  But I'm not fluent with the RDMA codebase.  Maybe you'll have a
better picture after seeing what I said here and there.

This patch is almost something I want to guarantee it won't happen, hence
for whatever could return QIO_CHANNEL_ERR_BLOCK for rdma channels I want to
make sure it immediately retries instead of hanging forever in the temp
main loop of qio_channel_wait().

> 
> 
> Thanks
> Zhijian
> 
> 
> > Simplify the code so that RDMA iochannels simply always rely on synchronous
> > reads and writes.  It is highly likely what 6ddd2d76ca6f86f was talking
> > about, that the async model isn't really working well.
> > 
> > This helps because this is almost the only dependency that the migration
> > core would need a coroutine for rdma channels.
> > 
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> >   migration/rdma.c | 69 +++---------------------------------------------
> >   1 file changed, 3 insertions(+), 66 deletions(-)
> > 
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index ed4e20b988..bcd7aae2f2 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -2789,56 +2789,14 @@ static gboolean
> >   qio_channel_rdma_source_prepare(GSource *source,
> >                                   gint *timeout)
> >   {
> > -    QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
> > -    RDMAContext *rdma;
> > -    GIOCondition cond = 0;
> >       *timeout = -1;
> > -
> > -    RCU_READ_LOCK_GUARD();
> > -    if (rsource->condition == G_IO_IN) {
> > -        rdma = qatomic_rcu_read(&rsource->rioc->rdmain);
> > -    } else {
> > -        rdma = qatomic_rcu_read(&rsource->rioc->rdmaout);
> > -    }
> > -
> > -    if (!rdma) {
> > -        error_report("RDMAContext is NULL when prepare Gsource");
> > -        return FALSE;
> > -    }
> > -
> > -    if (rdma->wr_data[0].control_len) {
> > -        cond |= G_IO_IN;
> > -    }
> > -    cond |= G_IO_OUT;
> > -
> > -    return cond & rsource->condition;
> > +    return TRUE;
> >   }
> >   
> >   static gboolean
> >   qio_channel_rdma_source_check(GSource *source)
> >   {
> > -    QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
> > -    RDMAContext *rdma;
> > -    GIOCondition cond = 0;
> > -
> > -    RCU_READ_LOCK_GUARD();
> > -    if (rsource->condition == G_IO_IN) {
> > -        rdma = qatomic_rcu_read(&rsource->rioc->rdmain);
> > -    } else {
> > -        rdma = qatomic_rcu_read(&rsource->rioc->rdmaout);
> > -    }
> > -
> > -    if (!rdma) {
> > -        error_report("RDMAContext is NULL when check Gsource");
> > -        return FALSE;
> > -    }
> > -
> > -    if (rdma->wr_data[0].control_len) {

[1]

> > -        cond |= G_IO_IN;
> > -    }
> > -    cond |= G_IO_OUT;
> > -
> > -    return cond & rsource->condition;
> > +    return TRUE;
> >   }
> >   
> >   static gboolean
> > @@ -2848,29 +2806,8 @@ qio_channel_rdma_source_dispatch(GSource *source,
> >   {
> >       QIOChannelFunc func = (QIOChannelFunc)callback;
> >       QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
> > -    RDMAContext *rdma;
> > -    GIOCondition cond = 0;
> > -
> > -    RCU_READ_LOCK_GUARD();
> > -    if (rsource->condition == G_IO_IN) {
> > -        rdma = qatomic_rcu_read(&rsource->rioc->rdmain);
> > -    } else {
> > -        rdma = qatomic_rcu_read(&rsource->rioc->rdmaout);
> > -    }
> > -
> > -    if (!rdma) {
> > -        error_report("RDMAContext is NULL when dispatch Gsource");
> > -        return FALSE;
> > -    }
> > -
> > -    if (rdma->wr_data[0].control_len) {
> > -        cond |= G_IO_IN;
> > -    }
> > -    cond |= G_IO_OUT;
> >   
> > -    return (*func)(QIO_CHANNEL(rsource->rioc),
> > -                   (cond & rsource->condition),
> > -                   user_data);
> > +    return (*func)(QIO_CHANNEL(rsource->rioc), rsource->condition, 
> > user_data);
> >   }
> >   
> >   static void

-- 
Peter Xu


Reply via email to