Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate

Tejus GK Mon, 09 Mar 2026 10:42:59 -0700


> On 9 Mar 2026, at 10:47 PM, Daniel P. Berrangé <[email protected]> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Mon, Mar 09, 2026 at 12:59:44PM -0400, Peter Xu wrote:
>> On Mon, Mar 09, 2026 at 04:48:37PM +0000, Daniel P. Berrangé wrote:
>>>> @@ -881,8 +881,8 @@ static int 
>>>> qio_channel_socket_flush_internal(QIOChannel *ioc,
>>>>         sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
>>>> 
>>>>         /* If any sendmsg() succeeded using zero copy, mark zerocopy 
>>>> success */
>>>> -        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
>>>> -            sioc->new_zero_copy_sent_success = true;
>>>> +        if (serr->ee_code == SO_EE_CODE_ZEROCOPY_COPIED) {
>>>> +            sioc->zero_copy_fallback++;
>>> 
>>> ...this is counting the number of MSG_ERRQUEUE items, which is not
>>> the same as the number of IO requests. That's why we only used it
>>> as a boolean marker originally, rather than making it a counter.
>> 
>> Would the logic still work and better than before?  Say, it's a counter of
>> "messages" rather than "IOs" then.
> 
> IIUC it is a counter of processing notifications which is not directly
> correlated to any action by QEMU - neither bytes nor syscalls.


Please correct me if I'm wrong about this, isn’t each notification an 
information 
about what happened to an individual IO?
> 
>> The problem with the old code was we may report fallback=0 even if there
>> can have fallback happened, as we mask that fact as long as one zerocopy
>> happened in the whole batch between two flushes.  So it seems this (even if
>> the counter is not per-IO) is still better.
> 
> Better for what purpose though ?
> 
> If we enabled zero-copy, it is useful to know if /something/ managed
> to benefit from zero-copy. ie if /always/ fails to zero-copy then
> we can diagnose that the NIC driver isn't capable of it, or there
> is some other limitation.  If something manages to zero-copy, then
> we know the feature is functionally working.
> 
> What will we do with a count of notificaitons ?

I was wondering if it can be useful for debugging live migration issues where 
zerocopy is 
enabled. For instance, let’s say a zerocopy write failed due to the socket 
error queue being
full. Now it could be either due to the out of order processing we had seen 
before 
(https://github.com/qemu/qemu/commit/84005f4a2b8745e5934f955c045a0b4311cd0992) 
or 
due to it getting filled up because some copies getting deferred. For the 
latter, this stat can be
worthwile as a debugging stat. 

Regards, 
Tejus

Re: [PATCH v2 1/1] io: make zerocopy fallback accounting more accurate

Reply via email to