On Mon, Mar 09, 2026 at 02:21:49PM -0400, Peter Xu wrote:
> On Mon, Mar 09, 2026 at 05:51:29PM +0000, Daniel P. Berrangé wrote:
> > On Mon, Mar 09, 2026 at 05:42:08PM +0000, Tejus GK wrote:
> > > 
> > > 
> > > > On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <[email protected]> 
> > > > wrote:
> > > > 
> > > > !-------------------------------------------------------------------|
> > > >  CAUTION: External Email
> > > > 
> > > > |-------------------------------------------------------------------!
> > > > 
> > > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
> > > >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
> > > >>>> @@ -881,8 +881,8 @@ static int 
> > > >>>> qio_channel_socket_flush_internal(QIOChannel *ioc,
> > > >>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > > >>>> 
> > > >>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy 
> > > >>>> success */
> > > >>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > > >>>> -            sioc->new_zero_copy_sent_success = true;
> > > >>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
> > > >>>> +            sioc->zero_copy_fallback++;
> > > >>> 
> > > >>> ...this is counting the number of MSG_ERRQUEUE items, which is not
> > > >>> the same as the number of IO requests. That's why we only used it
> > > >>> as a boolean marker originally, rather than making it a counter.
> > > >> 
> > > >> Would the logic still work and better than before?  Say, it's a 
> > > >> counter of
> > > >> "messages" rather than "IOs" then.
> > > > 
> > > > IIUC it is a counter of processing notifications which is not directly
> > > > correlated to any action by QEMU - neither bytes nor syscalls.
> > > 
> > > Please correct me if I'm wrong about this, isn’t each notification an 
> > > information 
> > > about what happened to an individual IO?
> > 
> > If userspace hasn't read a queued notification yet, the kernel will
> > merge new notifications with the existing queued one.
> > 
> > The line above your change
> > 
> >   serr->ee_data - serr->ee_info + 1;
> > 
> > records how many notifications were merged, so we now how many
> > syscalls were processed.
> > 
> > If ee_code is  SO_EE_CODE_ZEROCOPY_COPIED though it means at least
> > one syscall resulted in a copy, but that doesn't imply that *all*
> > syscalls resulted in a copy.
> > 
> > AFAICT, it could be 1 out of a 1000 syscalls resulted in a copy,
> > or it could be 1000 out of 1000 resulted in a copy. We don't know.
> > 
> > IIUC the kernel's merging of notifications appears lossy wrt this
> > information. It could be partially mitigated by doing a flush for
> > notifications really really frequently but that feels like it would
> > have its own downsides
> 
> IMHO what this change does is removing the false negatives.
> 
> Before this patch, if QEMU reports fallback=0, it doesn't mean all the
> MSG_ZEROCOPY requests were all fulfilled by zerocopy.  It's because we
> justify it with one boolean over "a period of time" between two flushes, we
> set the boolean to TRUE as long as there is _one_ successful report of
> MSG_ZEROCOPY.  So even if every flush reports TRUE it only means "there is
> at least one MSG_ZEROCOPY request that didn't fallback".  It has no
> implication of whether a fallback happened.
> 
> Hence, before this v2 patch, there can be false negative reported by QEMU,
> assuming there's no fallback (reflected in stats) but it actually happened.
> 
> After this patch, if QEMU reports fallback=0, it guarantees that _all_
> MSG_ZEROCOPY requests are fulfilled with zerocopy.  It's because we monitor
> all messages and accumulate any fallback cases.  Even if the messages can
> be merged, when "fallback" shows anything non-zero would imply some
> fallback happened.  Here, the counter value doesn't really matter much
> IMHO, as long as it becomes non-zero.

AFAICT, the v1 of this patch was sufficient to address the original
bug and maintain the current intended semantics of the migration
counter. This v2 is mixing a bug fix with functional change in
behaviour and I don't think the latter is justified.

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|


Reply via email to