On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> > 
> > 
> > > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <[email protected]> 
> > > wrote:
> > > 
> > > !-------------------------------------------------------------------|
> > >  CAUTION: External Email
> > > 
> > > |-------------------------------------------------------------------!
> > > 
> > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> > >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > >>>> @@ -881,8 +881,8 @@ static int 
> > >>>> qio_channel_socket_flush_internal(QIOChannel *ioc,
> > >>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > >>>> 
> > >>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy 
> > >>>> success */
> > >>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > >>>> -            sioc->new_zero_copy_sent_success = true;
> > >>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > >>>> +            sioc->zero_copy_fallback++;
> > >>> 
> > >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> > >>> the same as the number of IO requests. That's why we only used it
> > >>> as a boolean marker originally, rather than making it a counter.
> > >> 
> > >> Would the logic still work and better than before?  Say, it's a counter 
> > >> of
> > >> "messages" rather than "IOs" then.
> > > 
> > > IIUC it is a counter of processing notifications which is not directly
> > > correlated to any action by QEMU - neither bytes nor syscalls.
> > 
> > Please correct me if I'm wrong about this, isn’t each notification an 
> > information 
> > about what happened to an individual IO?
> 
> If userspace hasn't read a queued notification yet, the kernel will
> merge new notifications with the existing queued one.
> 
> The line above your change
> 
>   serr->ee_data - serr->ee_info + 1;
> 
> records how many notifications were merged, so we now how many
> syscalls were processed.
> 
> If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
> one syscall resulted in a copy, but that doesn't imply that *all*
> syscalls resulted in a copy.
> 
> AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
> or it could be 1000 out of 1000 resulted in a copy. We don't know.
> 
> IIUC the kernel's merging of notifications appears lossy wrt this
> information. It could be partially mitigated by doing a flush for
> notifications really really frequently but that feels like it would
> have its own downsides

IMHO what this change does is removing the false negatives.

Before this patch, if QEMU reports fallback=0, it doesn't mean all the
MSG_ZEROCOPY requests were all fulfilled by zerocopy.  It's because we
justify it with one boolean over "a period of time" between two flushes, we
set the boolean to TRUE as long as there is _one_ successful report of
MSG_ZEROCOPY.  So even if every flush reports TRUE it only means "there is
at least one MSG_ZEROCOPY request that didn't fallback".  It has no
implication of whether a fallback happened.

Hence, before this v2 patch, there can be false negative reported by QEMU,
assuming there's no fallback (reflected in stats) but it actually happened.

After this patch, if QEMU reports fallback=0, it guarantees that _all_
MSG_ZEROCOPY requests are fulfilled with zerocopy.  It's because we monitor
all messages and accumulate any fallback cases.  Even if the messages can
be merged, when "fallback" shows anything non-zero would imply some
fallback happened.  Here, the counter value doesn't really matter much
IMHO, as long as it becomes non-zero.

Thanks,

-- 
Peter Xu


Reply via email to