Hi Richard, sorry for not answering sooner. I took action on your mail but failed to also give feedback. Now in light of your veto of Srinivas patch I wanted to use the opportunity to pick this up again.
On Dienstag, 23. Januar 2024 21:57:23 CET Richard Sandiford wrote: > However, we also support different vector lengths for streaming SVE > (running in "streaming" mode on SME) and non-streaming SVE (running > in "non-streaming" mode on the core). Having two different lengths is > expected to be the common case, rather than a theoretical curiosity. I read up on this after you mentioned this for the first time. As a WG21 member I find the approach troublesome - but that's a bit off-topic for this thread. The big issue here is that, IIUC, a user (and the simd library) cannot do the right thing at the moment. There simply isn't enough context information available when parsing the <experimental/simd> header. I.e. on definition of the class template there's no facility to take target_clones or SME "streaming" mode into account. Consequently, if we want the library to be fit for SME, then we need more language extension(s) to make it work. I guess I'm looking for a way to declare types that are different depending on whether they are used in streaming mode or non-streaming mode (making them ill-formed to use in functions marked arm_streaming_compatible). From reading through https://arm-software.github.io/acle/main/ acle.html#controlling-the-use-of-streaming-mode I don't see any discussion of member functions or ctor/dtor, static and non-static data members, etc. The big issue I see here is that currently all of std::* is declared without a arm_streaming or arm_streaming_compatible. Thus, IIUC, you can't use anything from the standard library in streaming mode. Since that also applies to std::experimental::simd, we're not creating a new footgun, only missing out on potential users? Some more thoughts on target_clones/streaming SVE language extension evolution: void nonstreaming_fn(void) { constexpr int width = __arm_sve_bits(); // e.g. 512 constexpr int width2 = __builtin_vector_size(); // e.g. 64 (the // vector_size attribute works with bytes, not bits) } __attribute__((arm_locally_streaming)) void streaming_fn(void) { constexpr int width = __arm_sve_bits(); // e.g. 128 constexpr int width2 = __builtin_vector_size(); // e.g. 16 } __attribute__((target_clones("sse4.2,avx2"))) void streaming_fn(void) { constexpr int width = __builtin_vector_size(); // 16 in the sse4.2 clone // and 32 in the avx2 clone } ... as a starting point for exploration. Given this, I'd still have to resort to a macro to define a "native" simd type: #define NATIVE_SIMD(T) std::experimental::simd<T, _SveAbi<__arm_sve_bits() / CHAR_BITS, __arm_sve_bits() / CHAR_BITS>> Getting rid of the macro seems to be even harder. A declaration of an alias like template <typename T> using SveSimd = std::experimental::simd<T, _SveAbi<__arm_sve_bits() / CHAR_BITS, __arm_sve_bits() / CHAR_BITS>>; would have to delay "invoking" __arm_sve_bits() until it knows its context: void nonstreaming_fn(void) { static_assert(sizeof(SveSimd<float>) == 64); } __attribute__((arm_locally_streaming)) void streaming_fn(void) { static_assert(sizeof(SveSimd<float>) == 16); nonstreaming_fn(); // fine } This gets even worse for target_clones, where void f() { sizeof(std::simd<float>) == ? } __attribute__((target_clones("sse4.2,avx2"))) void g() { f(); } the compiler *must* virally apply target_clones to all functions it calls. And member functions must either also get cloned as functions, or the whole type must be cloned (as in the std::simd case, where the sizeof needs to change). 😳 > When would NumberOfUsedBytes < SizeofRegister be used for SVE? Would it > be for storing narrower elements in wider containers? If the interface > supports that then, yeah, two parameters would probably be safer. > > Or were you thinking about emulating narrower vectors with wider registers > using predication? I suppose that's possible too, and would be similar in > spirit to using SVE to optimise Advanced SIMD std::simd types. > But mightn't it cause confusion if sizeof applied to a "16-byte" > vector actually gives 32? Yes, the idea is to e.g. use one SVE register instead of two NEON registers for a "float, 8" with SVE512. The user never asks for a "16-byte" vector. The user asks for a value-type and and number of elements. Sure, the wasteful "padding" might come as a surprise, but it's totally within the spec to implement it like this. > I assume std::experimental::native_simd<int> has to have the same > meaning everywhere for ODR reasons? No. Only std::experimental::simd<int> has to be "ABI stable". And note that in the C++ spec there's no such thing as compiling and linking TUs with different compiler flags. That's plain UB. The committee still cares about it, but getting this "right" cannot be part of the standard and must be defined by implementers > If so, it'll need to be an > Advanced SIMD vector for AArch64 (but using SVE to optimise certain > operations under the hood where available). I don't think we could > support anything else. simd<int> on AArch64 uses [[gnu::vector_size(16)]]. > Even if SVE registers are larger than 128 bits, we can't require > all code in the binary to be recompiled with that knowledge. > > I suppose that breaks the "largest" requirement, but due to the > streaming/non-streaming distinction I mentioned above, there isn't > really a single, universal "largest" in this context. There is, but it's context-dependent. I'd love to make this work. > SVE and Advanced SIMD are architected to use the same registers > (i.e. SVE registers architecturally extend Advanced SIMD registers). > In Neoverse V1 (SVE256) they are the same physical register as well. > I believe the same is true for A64FX. That's good to know. 👍 > FWIW, GCC has already started using SVE in this way. E.g. SVE provides > a wider range of immediate constants for logic operations, so we now use > them for Advanced SIMD logic where beneficial. I will consider these optimizations (when necessary in the library) for the C++26 implementation. Best, Matthias -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Center for Heavy Ion Research https://gsi.de std::simd ──────────────────────────────────────────────────────────────────────────