> On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <[email protected]> wrote: > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote: >> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote: >>>> @@ -881,8 +881,8 @@ static int >>>> qio_channel_socket_flush_internal(QIOChannel *ioc, >>>> sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1; >>>> >>>> /* If any sendmsg() succeeded using zero copy, mark zerocopy >>>> success */ >>>> - if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) { >>>> - sioc->new_zero_copy_sent_success = true; >>>> + if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) { >>>> + sioc->zero_copy_fallback++; >>> >>> ...this is counting the number of MSG_ERRQUEUE items, which is not >>> the same as the number of IO requests. That's why we only used it >>> as a boolean marker originally, rather than making it a counter. >> >> Would the logic still work and better than before? Say, it's a counter of >> "messages" rather than "IOs" then. > > IIUC it is a counter of processing notifications which is not directly > correlated to any action by QEMU - neither bytes nor syscalls.
Please correct me if I'm wrong about this, isn’t each notification an information about what happened to an individual IO? > >> The problem with the old code was we may report fallback=0 even if there >> can have fallback happened, as we mask that fact as long as one zerocopy >> happened in the whole batch between two flushes. So it seems this (even if >> the counter is not per-IO) is still better. > > Better for what purpose though ? > > If we enabled zero-copy, it is useful to know if /something/ managed > to benefit from zero-copy. ie if /always/ fails to zero-copy then > we can diagnose that the NIC driver isn't capable of it, or there > is some other limitation. If something manages to zero-copy, then > we know the feature is functionally working. > > What will we do with a count of notificaitons ? I was wondering if it can be useful for debugging live migration issues where zerocopy is enabled. For instance, let’s say a zerocopy write failed due to the socket error queue being full. Now it could be either due to the out of order processing we had seen before (https://github.com/qemu/qemu/commit/84005f4a2b8745e5934f955c045a0b4311cd0992) or due to it getting filled up because some copies getting deferred. For the latter, this stat can be worthwile as a debugging stat. Regards, Tejus
