Source: jellyfish Version: 2.2.6-1 User: debian-...@lists.debian.org Usertags: arm64
Jellyfish seems to be easy to port. Just provide alternatives to the inline assembler in rectangular_binary_matrix.hpp: #ifdef __x86_64__ #define AND_XOR(off) \ asm("movdqa (%[s],%[i]), %[load]\n\t" \ "pand " #off "(%[p]),%[load]\n\t" \ "pxor %[load],%[acc]\n\t" \ : [acc]"=&x"(acc) \ : "[acc]"(acc), [i]"r"(i), [p]"r"(p), [s]"r"(smear), [load]"x"(load)) #else #define AND_XOR(off) do { \ xmm_t a = { smear[i / 8], smear[i / 8 + 1] }; \ xmm_t b = { p[(off) / 8], p[(off) / 8 + 1] }; \ acc ^= a & b; \ } while (0) #endif #ifdef __x86_64__ uint64_t res1, res2; asm("movd %[acc], %[res1]\n\t" "psrldq $8, %[acc]\n\t" "movd %[acc], %[res2]\n\t" : [res1]"=r"(res1), [res2]"=r"(res2) : [acc]"x"(acc)); return res1 ^ res2; #else return acc[0] ^ acc[1]; #endif Then replace AND_XOR("0x30") with AND_XOR(0x30), AND_XOR("") with AND_XOR(0), and so on. You might find the non-assembler version performs just as well on amd64, too, in which case you could simplify the code. Tested on arm64. Likely to work on some other architectures.