On Fri, Nov 20, 2020 at 3:40 PM Niels Möller <ni...@lysator.liu.se> wrote:
>
> ni...@lysator.liu.se (Niels Möller) writes:
>
> > It could likely be speedup further by processing 2, 3 or 4 blocks in
> > parallel.
>
> I've given 2 blocks in parallel a try, but not quite working yet. My
> work-in-progress code below.
>
> When I test it on the gcc112 machine, it fails with an illegal
> instruction (SIGILL) on this line, close to function entry:
>
>   .globl _nettle_chacha_2core
>   .type _nettle_chacha_2core,%function
>   .align 5
>   _nettle_chacha_2core:
>   addis 2,12,(.TOC.-_nettle_chacha_2core)@ha
>   addi 2,2,(.TOC.-_nettle_chacha_2core)@l
>   .localentry _nettle_chacha_2core, .-_nettle_chacha_2core
>
>
>           li      r8, 0x30
>           vspltisw v1, 1
>   =>      vextractuw v1, v1, 0
>
> I don't understand, from the manual, what's wrong with this. The
> intention of this piece of code is just to construct the value {1, 0, 0,
> 0} in one of the vector registers. Maybe there's a better way to do
> that?

GCC112 is a POWER8 machine. According to the POWER manual, vextractuw
is a POWER9 instruction.

POWER8 manual: 
https://openpowerfoundation.org/?resource_lib=power8-processor-users-manual
POWER9 manual: 
https://openpowerfoundation.org/?resource_lib=power9-processor-users-manual

Jeff
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to