Hello!

I am interested in making a high-performance project which involves a
lot of math, which is why I am interested in using SIMD (AVX2) on x86_64
(and for fun as well, if I'm honest). I am coming mainly from the C and
C++ world where one has intrinsics (such as `_mm256_add_epi64`, to give
an example from the Intel® Intrinsics Guide). I am most familiar with
GCC (and to a lesser extent to Clang and ICC), where one can access
these intrinsics through headers such as <immintrin.h>. Is there a Free
Pascal equivalent for that?

I am well aware I can use asm blocks, but some intrinsics do more than
one instruction and over on C it's the compiler's responsibility to find
the best instruction for a given intrinsic.

Basically, can I directly implement `_mm256_add_epi64` so they're
equivalent to doing the same thing in C? If not, what would be the best
course of action to make wrappers for these intrinsics? I tried this:

```
program AVX2Example;

{$mode objfpc}{$H+}{$asmmode intel}

uses
  SysUtils;

type
  __m256i = packed array[0..3] of int64;

function _mm256_loadu_si256(src: __m256i): __m256i; assembler;
asm
    vmovdqu ymm0, ymmword ptr [src]
    vmovdqa [Result], ymm0
end;

function _mm256_add_epi64(a, b: __m256i): __m256i; assembler;
asm
    vmovdqa ymm0, [a]
    vmovdqa ymm1, [b]
    vpaddq ymm0, ymm0, ymm1
    vmovdqa [Result], ymm0
end;

var
  a: __m256i = (1, 2, 3, 4);
  b: __m256i = (5, 6, 7, 8);

  a1, a2: __m256i;
  res: __m256i;
  e: int64;
begin
  a1 := _mm256_loadu_si256(a);
  a2 := _mm256_loadu_si256(b);

  res := _mm256_add_epi64(a1, a2);

  for e in res do
  begin
    Write(e, ' ');
  end;
  Writeln;

end.
```

but it only works half of the time, so something is wrong.

Kind regards,
Stefan.
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to