bufferiszero: introduce an inline wrapper

Richard Henderson Thu, 08 Feb 2024 12:08:14 -0800

On 2/6/24 21:13, Alexander Monakov wrote:

Thank you. I agree we shouldn't penalize non-x86 hosts here, but to be honest
I'd really like to keep this optimization because so many places in Qemu invoke
buffer_is_zero with a constant length, allowing the compiler to optimize out
the length test.


Hmm.  True, both migration and image copy use large blocks frequently.

Would you be open to testing availability of optimized variants
in the inline wrapper like this:

diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index 62b153e603..7a2145ffef 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -209,11 +209,12 @@ static inline bool buffer_is_zero(const void *vbuf, 
size_t len)
          return true;
      }

+#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
      if (len >= 256) {
          return buffer_is_zero_len_256_plus(vbuf, len);
-    } else {
-        return buffer_is_zero_len_4_plus(vbuf, len);
      }
+#endif
+    return buffer_is_zero_len_4_plus(vbuf, len);


Plausible.

Also, now that we're down to two variants instead of four, perhaps a statically predicteddirect branch or two might be better than an indirect branch? E.g.


bool buffer_is_zero_len_256_plus(buf, len)
{
#ifdef CONFIG_AVX2_OPT
    if (select_accel & CPUINFO_AVX2) {
        return buffer_zero_avx2(buf, len);
    }
#endif
#ifdef __SSE2__
    if (select_accel & CPUINFO_SSE2) {
        return buffer_zero_sse2(buf, len);
    }
#endif
    return buffer_is_zero_len_4_plus(buf, len);
}

where select_accel would be set by test_buffer_is_zero_next_accel() etc.


r~

Re: [PATCH v3 2/6] util/bufferiszero: introduce an inline wrapper

Reply via email to