Hello! I am interested in making a high-performance project which involves a lot of math, which is why I am interested in using SIMD (AVX2) on x86_64 (and for fun as well, if I'm honest). I am coming mainly from the C and C++ world where one has intrinsics (such as `_mm256_add_epi64`, to give an example from the Intel® Intrinsics Guide). I am most familiar with GCC (and to a lesser extent to Clang and ICC), where one can access these intrinsics through headers such as <immintrin.h>. Is there a Free Pascal equivalent for that?
I am well aware I can use asm blocks, but some intrinsics do more than one instruction and over on C it's the compiler's responsibility to find the best instruction for a given intrinsic. Basically, can I directly implement `_mm256_add_epi64` so they're equivalent to doing the same thing in C? If not, what would be the best course of action to make wrappers for these intrinsics? I tried this: ``` program AVX2Example; {$mode objfpc}{$H+}{$asmmode intel} uses SysUtils; type __m256i = packed array[0..3] of int64; function _mm256_loadu_si256(src: __m256i): __m256i; assembler; asm vmovdqu ymm0, ymmword ptr [src] vmovdqa [Result], ymm0 end; function _mm256_add_epi64(a, b: __m256i): __m256i; assembler; asm vmovdqa ymm0, [a] vmovdqa ymm1, [b] vpaddq ymm0, ymm0, ymm1 vmovdqa [Result], ymm0 end; var a: __m256i = (1, 2, 3, 4); b: __m256i = (5, 6, 7, 8); a1, a2: __m256i; res: __m256i; e: int64; begin a1 := _mm256_loadu_si256(a); a2 := _mm256_loadu_si256(b); res := _mm256_add_epi64(a1, a2); for e in res do begin Write(e, ' '); end; Writeln; end. ``` but it only works half of the time, so something is wrong. Kind regards, Stefan. _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal