target_clones for more than just a single function

Matthias Kretz via Gcc Fri, 17 Oct 2025 19:11:23 -0700

The target_clones attribute documentation says:
> Note that any subsequent call of a function without target_clone from a
> target_clone caller will not lead to copying (target clone) of the called
> function. If you want to enforce such behavior, we recommend declaring the
> calling function with the flatten attribute?


This isn't anywhere near to what we need for explicit vectorization; i.e. when 
you need to know the native SIMD width and what intrinsics/builtins are valid 
to call.

I've been adding a few thoughts and ideas on the topic at https://gcc.gnu.org/
bugzilla/show_bug.cgi?id=83875. But before I go any further I'd like to know 
whether there's a chance this is ever going to happen.

End goal (made up example):

------------------------------
#include <simd>

namespace simd = std::simd;

[[gnu::target_clones("arch=x86-64,arch=x86-64-v2,arch=x86-64-v3,arch=x86-64-
v4")]]
int do_work(std::span<float> data)
{
  using Vf = simd::vec<float>;
  Vf v = simd::unchecked_load<Vf>(data);
  if (all_of(v == 0.f))
    return 0;
  v += 1.f;
  simd::unchecked_store(v, data);
  return Vf::size();
}
------------------------------

This example requires a different type Vf for three of the four clones. For 
x86-64 and x86-64-v2, Vf is the same type but 'all_of' is implemented 
differently (using the ptest builtin for v2).

I have ideas how this could be done (in principle). I can't implement it in 
the compiler for lack of time and knowledge. But before I invest more time in 
specifying a solution idea and preparing the std::simd code for working like 
that, I'd like to know whether there's interest.

- Matthias

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research               https://gsi.de
 std::simd
──────────────────────────────────────────────────────────────────────────

target_clones for more than just a single function

Reply via email to