On Mon, Jun 13, 2022 at 05:58:44PM -0300, Leonardo Bras Soares Passos wrote: > Hello Peter, > > On Wed, Jun 8, 2022 at 5:23 PM Peter Xu <pet...@redhat.com> wrote: > [...] > > > In a previous iteration of the patchset, it was made clear that it's > > > desirable to detect when the kernel falls back to copying mechanism, > > > so the user of 'QIOChannelSocket' can switch to copying and avoid the > > > overhead. This was done by the return value of flush(), which is 1 if > > > that occurs. > > > > Two questions.. > > > > 1) When that happens, will MSG_ERRQUEUE keeps working just like zerocopy > > is functional? > > I am not sure about what exactly you meant by 'like zerocopy is > funcional', but the > idea is that reading from MSG_ERRQUEUE should return a msg for each sendmsg > syscall with MSG_ZEROCOPY that previously happened. This does not depend on > the outcome (like falling back to the copying mechanism). > btw, most of those messages may be batched to reduce overhead. > > At some point, zero-copy may fail, and fall back to copying, so in > those messages > an error code SO_EE_CODE_ZEROCOPY_COPIED can be seen. Having only > those messages in a flush will trigger the returning of 1 from the > flush function.
Ah I think I missed the "reset ret==0 when !SO_EE_CODE_ZEROCOPY_COPIED" path.. Sorry. > > > > > If the answer is yes, I don't see how ret=1 will ever be > > returned.. because we'll also go into the same loop in > > qio_channel_socket_flush() anyway. > > > We set ret to 1 at function entry and then for each message in the > MSG_ERRQUEUE, > we test if it has error code different than SO_EE_CODE_ZEROCOPY_COPIED. > If it ever have a different error code, we set ret=0. > > So, in our previous example, if we have a net device not supporting > the 'Scatter-Gather' > feature (NETIF_F_SG), every error message will be > SO_EE_CODE_ZEROCOPY_COPIED, and it will return 1. > > > > > > If the answer is no, then since we'll have non-zero zero_copy_queued, > > will the loop in qio_channel_socket_flush() go into a dead one? How > > could it return? > > No, because it will go through all packets sent with MSG_ZEROCOPY, including > the > ones that fell back to copying, so the counter should be fine. If any > code disables > zero-copy, it will both stop sending stuff wil MSG_ZEROCOPY and flushing, so > it > should be fine. > > > > > 2) Even if we have the correct ret=1 returned when that happens, which > > caller is detecting that ret==1 and warn the admin? > > > > No caller is using that right now. > It's supposed to be a QIOChannel interface feature, and any > user/implementation > could use that information to warn if zero-copy is not being used, fall back > to > copying directly (to avoid overhead of testing zero-copy) or even use > it to cancel the > sending if wanted. > > It was a suggestion of Daniel on top of [PATCH v5 1/6] IIRC. OK the detection makes sense, thanks for the details. Then now I'm wondering whether we should have warned the admin already if zero-copy send is not fully enabled in live migration. Should we add a error_report_once() somewhere for the ret==1 already? After all the user specify zero_copy_send=true explicitly. Did I miss something again? Thanks, -- Peter Xu