On 2022/6/14 下午10:14, Dr. David Alan Gilbert wrote:
I don't think we can tell which one of them triggered the error; so the
only thing I can suggest is that we document the need for optmem_max
setting; I wonder how we get a better answer than 'a few 100KB'?
I guess it's something like the number of packets inflight *
sizeof(cmsghdr) ?
Dave
Three cases with errno ENOBUFS are described in the official
doc(https://www.kernel.org/doc/html/v5.12/networking/msg_zerocopy.html):
1.The socket option was not set
2.The socket exceeds its optmem limit
3.The user exceeds its ulimit on locked pages
For case 1, if the code logic is correct, this possibility can be ignored.
For case 2, I asked a kernel developer about the reason for "a few
100KB". He said that the recommended value should be for the purpose of
improving the performance of zero_copy send. If the NICsends data slower
than the data generation speed, even if optmem is set to 100KB, there is
a probability that sendmsg returns with errno ENOBUFS.
For case 3, If I do not set max locked memory for the qemu, the max
locked memory will be unlimited. I set the max locked memory for qemu
and found that once the memory usage exceeds the max locked memory, oom
will occur. Does this mean that sendmsg cannot return with errno
ENOBUFS at all when user exceeds its ulimit on locked pages?
If the above is true, can we take the errno as the case 2?
I modified the code logic to call sendmsg again when the errno is
ENOBUFS and set optmem to the initial 20KB(echo 20480 >
/proc/sys/net/core/optmem_max), now the multifd zero_copy migration goes
well.
Here are the changes I made to the code:
Signed-off-by: chuang xu <xuchuangxc...@bytedance.com>
---
io/channel-socket.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/io/channel-socket.c b/io/channel-socket.c
index dc9c165de1..9267f55a1d 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -595,9 +595,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel
*ioc,
#ifdef QEMU_MSG_ZEROCOPY
case ENOBUFS:
if (sflags & MSG_ZEROCOPY) {
- error_setg_errno(errp, errno,
- "Process can't lock enough memory for
using MSG_ZEROCOPY");
- return -1;
+ goto retry;
}
break;
#endif
--
Dave, what's your take?
Best Regards,
chuang xu