Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear
On Wed, Sep 20, 2023 at 05:04:12PM +0800, Li Zhijian wrote: > From: Li Zhijian > > Previously, we got a confusion error that complains > the RDMAControlHeader.repeat: > qemu-system-x86_64: rdma: Too many requests in this message > (3638950032).Bailing. > > Actually, it's caused by an unexpected RDMAControlHeader.type. > After this patch, error will become: > qemu-system-x86_64: Unknown control message QEMU FILE > > Signed-off-by: Li Zhijian Reviewed-by: Peter Xu -- Peter Xu
Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear
"Zhijian Li (Fujitsu)" writes: > On 20/09/2023 21:01, Fabiano Rosas wrote: >> Li Zhijian writes: >> >>> From: Li Zhijian >>> >>> Previously, we got a confusion error that complains >>> the RDMAControlHeader.repeat: >>> qemu-system-x86_64: rdma: Too many requests in this message >>> (3638950032).Bailing. >>> >>> Actually, it's caused by an unexpected RDMAControlHeader.type. >>> After this patch, error will become: >>> qemu-system-x86_64: Unknown control message QEMU FILE >>> >>> Signed-off-by: Li Zhijian >>> --- >>> migration/rdma.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/migration/rdma.c b/migration/rdma.c >>> index a2a3db35b1..3073d9953c 100644 >>> --- a/migration/rdma.c >>> +++ b/migration/rdma.c >>> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel >>> *ioc, >>> size_t remaining = iov[i].iov_len; >>> uint8_t * data = (void *)iov[i].iov_base; >>> while (remaining) { >>> -RDMAControlHeader head; >>> +RDMAControlHeader head = {}; >>> >>> len = MIN(remaining, RDMA_SEND_INCREMENT); >>> remaining -= len; >> > > 2815 RDMAControlHeader head = {}; > 2816 > 2817 len = MIN(remaining, RDMA_SEND_INCREMENT); > 2818 remaining -= len; > 2819 > 2820 head.len = len; > 2821 head.type = RDMA_CONTROL_QEMU_FILE; > 2822 > 2823 ret = qemu_rdma_exchange_send(rdma, , data, NULL, NULL, > NULL); > >> I'm struggling to see how head is used before we set the type a couple >> of lines below. Could you expand on it? > > > IIUC, head is used for both common migration control path and RDMA specific > control path. > > hook_stage(RAM_SAVE_FLAG_HOOK) { > rdma_hook_process(qemu_rdma_registration_handle) { >do { >// this is a RDMA own control block, should not be disturbed by > the common migration control path. >// head will be extracted and processed here. >// qio_channel_rdma_writev() will send RDMA_CONTROL_QEMU_FILE, > which is an unexpected message for this block. >// head.repeat will be examined before the type, so an > uninitialized repeat will confuse us here. >} while (!RDMA_CONTROL_REGISTER_FINISHED || !error) > } > } > > > when qio_channel_rdma_writev() is used for common migration control path, > repeat is useless and will not be examined. > > With this patch, we can quickly know the cause. > Ah, right. Somehow I interpreted the commit message as meaning the 'type' field was bogus. But it's the 'repeat' field that causes the issue. Thanks for the explanation. Reviewed-by: Fabiano Rosas
Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear
On 20/09/2023 21:01, Fabiano Rosas wrote: > Li Zhijian writes: > >> From: Li Zhijian >> >> Previously, we got a confusion error that complains >> the RDMAControlHeader.repeat: >> qemu-system-x86_64: rdma: Too many requests in this message >> (3638950032).Bailing. >> >> Actually, it's caused by an unexpected RDMAControlHeader.type. >> After this patch, error will become: >> qemu-system-x86_64: Unknown control message QEMU FILE >> >> Signed-off-by: Li Zhijian >> --- >> migration/rdma.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index a2a3db35b1..3073d9953c 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, >> size_t remaining = iov[i].iov_len; >> uint8_t * data = (void *)iov[i].iov_base; >> while (remaining) { >> -RDMAControlHeader head; >> +RDMAControlHeader head = {}; >> >> len = MIN(remaining, RDMA_SEND_INCREMENT); >> remaining -= len; > 2815 RDMAControlHeader head = {}; 2816 2817 len = MIN(remaining, RDMA_SEND_INCREMENT); 2818 remaining -= len; 2819 2820 head.len = len; 2821 head.type = RDMA_CONTROL_QEMU_FILE; 2822 2823 ret = qemu_rdma_exchange_send(rdma, , data, NULL, NULL, NULL); > I'm struggling to see how head is used before we set the type a couple > of lines below. Could you expand on it? IIUC, head is used for both common migration control path and RDMA specific control path. hook_stage(RAM_SAVE_FLAG_HOOK) { rdma_hook_process(qemu_rdma_registration_handle) { do { // this is a RDMA own control block, should not be disturbed by the common migration control path. // head will be extracted and processed here. // qio_channel_rdma_writev() will send RDMA_CONTROL_QEMU_FILE, which is an unexpected message for this block. // head.repeat will be examined before the type, so an uninitialized repeat will confuse us here. } while (!RDMA_CONTROL_REGISTER_FINISHED || !error) } } when qio_channel_rdma_writev() is used for common migration control path, repeat is useless and will not be examined. With this patch, we can quickly know the cause. > > Also, a smoke test could have caught both issues early on. Is there any > reason for not having any? i have no idea yet :) Thanks Zhijian
Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear
Li Zhijian writes: > From: Li Zhijian > > Previously, we got a confusion error that complains > the RDMAControlHeader.repeat: > qemu-system-x86_64: rdma: Too many requests in this message > (3638950032).Bailing. > > Actually, it's caused by an unexpected RDMAControlHeader.type. > After this patch, error will become: > qemu-system-x86_64: Unknown control message QEMU FILE > > Signed-off-by: Li Zhijian > --- > migration/rdma.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index a2a3db35b1..3073d9953c 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, > size_t remaining = iov[i].iov_len; > uint8_t * data = (void *)iov[i].iov_base; > while (remaining) { > -RDMAControlHeader head; > +RDMAControlHeader head = {}; > > len = MIN(remaining, RDMA_SEND_INCREMENT); > remaining -= len; I'm struggling to see how head is used before we set the type a couple of lines below. Could you expand on it? Also, a smoke test could have caught both issues early on. Is there any reason for not having any?
[PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear
From: Li Zhijian Previously, we got a confusion error that complains the RDMAControlHeader.repeat: qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing. Actually, it's caused by an unexpected RDMAControlHeader.type. After this patch, error will become: qemu-system-x86_64: Unknown control message QEMU FILE Signed-off-by: Li Zhijian --- migration/rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration/rdma.c b/migration/rdma.c index a2a3db35b1..3073d9953c 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, size_t remaining = iov[i].iov_len; uint8_t * data = (void *)iov[i].iov_base; while (remaining) { -RDMAControlHeader head; +RDMAControlHeader head = {}; len = MIN(remaining, RDMA_SEND_INCREMENT); remaining -= len; -- 2.31.1