Hello Niels,

On Tue, Feb 02, 2021 at 06:09:42PM +0100, Niels Möller wrote:

> > I've downloaded binary builds of clang for aarch64 from
> > https://releases.llvm.org/download.html. 3.9.1 was the oldest prebuilt
> > toolchain I could find there and 11.0.0 the most recent.
> [...]

> > They also all support the .arch directive:
> >
> > $ cat t.s
> > .arch armv8-a+crypto
> > pmull v2.1q, v2.1d, v1.1d
> > $ aarch64-unknown-linux-gnu-as -o t.o t.s
> > $ clang+llvm-3.9.1-aarch64-linux-gnu/bin/clang -c -o t.o t.s
> > $ clang+llvm-11.0.0-aarch64-linux-gnu/bin/clang -c -o t.o t.s
> Thanks for investigating. The .arch pseudoop it is, then.

> I've pushed a change to use that, instead of modifying CFLAGS.

The arm64 branch builds and passes the testsuite on aarch64 and
aarch64_be with gcc 10.2 and clang 11.0.1 with and without the optimized
assembly routines on my pine64 boards. This is with the .arch directive
instead of modifying CFLAGS and the new configure option name
--enable-arm64-crypto.

Out of curiosity I've also collected some benchmark numbers for
gcm_aes256. (Is that a correct and sensible algorithm for that purpose?)

The speedup from using pmull seems to be around 35% for encrypt/decrypt.

Interestingly, LE is about a cycle per block faster than BE even though
it should have quite a few more rev64s to execute than BE. Could this be
masked by memory accesses, pipelining or scheduling?

How is the massive speedup in update to be interpreted and that BE here
is indeed quite a bit faster than LE? Do I understand correctly that on
update only GCM is run on unencrypted data for authentication purposes
so that this number really indicates the pure GCM pmull speedup? If so,
it would indicate 19-fold speedup and an 8.6% advantage to BE.

What's also curious is that the system's openssl 1.1.1i is consistenly
reported an order of magnitude faster than nettle. I guess the major
factor is that there's no optimized AES for aarch64 yet in nettle which
openssl seems to have. So I built an openssl 1.1.1i without assembly
which produced the last benchmark which would support that.

cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
performance
cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_max_freq
1152000
LD_LIBRARY_PATH=../.lib ./nettle-benchmark -f 1.152e9 gcm_aes256

         Algorithm         mode Mbyte/s cycles/byte cycles/block

aarch64-le gcc 10.2 with arm64-cypto:
        gcm_aes256      encrypt   29.42       37.34       597.41
        gcm_aes256      decrypt   29.43       37.34       597.36
        gcm_aes256       update 1417.32        0.78        12.40

openssl gcm_aes256      encrypt  391.93        2.80        44.85
openssl gcm_aes256      decrypt  392.35        2.80        44.80
openssl gcm_aes256       update 1246.04        0.88        14.11

aarch64-be gcc 10.2 with arm64-cypto:
        gcm_aes256      encrypt   29.35       37.43       598.82
        gcm_aes256      decrypt   29.36       37.42       598.77
        gcm_aes256       update 1540.34        0.71        11.41

openssl gcm_aes256      encrypt  398.96        2.75        44.06
openssl gcm_aes256      decrypt  397.66        2.76        44.20
openssl gcm_aes256       update 1306.05        0.84        13.46

aarch64-le clang 11.0.1 with arm64-cypto:
        gcm_aes256      encrypt   28.76       38.20       611.15
        gcm_aes256      decrypt   28.76       38.19       611.10
        gcm_aes256       update 1416.17        0.78        12.41

openssl gcm_aes256      encrypt  392.32        2.80        44.81
openssl gcm_aes256      decrypt  392.35        2.80        44.80
openssl gcm_aes256       update 1247.72        0.88        14.09

aarch64-be clang 11.0.1 with arm64-cypto:
        gcm_aes256      encrypt   28.70       38.28       612.53
        gcm_aes256      decrypt   28.69       38.29       612.59
        gcm_aes256       update 1543.87        0.71        11.39

openssl gcm_aes256      encrypt  399.46        2.75        44.00
openssl gcm_aes256      decrypt  398.90        2.75        44.07
openssl gcm_aes256       update 1317.87        0.83        13.34

aarch64-le gcc 10.2 without arm64-cypto:
        gcm_aes256      encrypt   21.43       51.27       820.28
        gcm_aes256      decrypt   21.43       51.27       820.30
        gcm_aes256       update   74.39       14.77       236.30

openssl gcm_aes256      encrypt  391.93        2.80        44.85
openssl gcm_aes256      decrypt  392.17        2.80        44.82
openssl gcm_aes256       update 1245.13        0.88        14.12

aarch64-be gcc 10.2 without arm64-cypto:
        gcm_aes256      encrypt   21.71       50.60       809.58
        gcm_aes256      decrypt   21.72       50.59       809.43
        gcm_aes256       update   79.01       13.90       222.47

openssl gcm_aes256      encrypt  398.43        2.76        44.12
openssl gcm_aes256      decrypt  398.67        2.76        44.09
openssl gcm_aes256       update 1309.52        0.84        13.42

aarch64-le clang 11.0.1 without arm64-cypto:
        gcm_aes256      encrypt   18.98       57.89       926.29
        gcm_aes256      decrypt   18.98       57.89       926.22
        gcm_aes256       update   53.67       20.47       327.53

openssl gcm_aes256      encrypt  392.16        2.80        44.82
openssl gcm_aes256      decrypt  392.17        2.80        44.82
openssl gcm_aes256       update 1248.30        0.88        14.08

aarch64-be clang 11.0.1 without arm64-cypto:
        gcm_aes256      encrypt   18.89       58.16       930.49
        gcm_aes256      decrypt   18.85       58.28       932.54
        gcm_aes256       update   53.67       20.47       327.53

openssl gcm_aes256      encrypt  399.36        2.75        44.02
openssl gcm_aes256      decrypt  398.87        2.75        44.07
openssl gcm_aes256       update 1318.44        0.83        13.33

aarch64-be gcc 10.2 without arm64-crypto and with no-asm openssl:
LD_LIBRARY_PATH=../../openssl-1.1.1i:../.lib ./nettle-benchmark -f 1.152e9 
gcm_aes256

         Algorithm         mode Mbyte/s cycles/byte cycles/block

        gcm_aes256      encrypt   21.72       50.59       809.43
        gcm_aes256      decrypt   21.72       50.59       809.45
        gcm_aes256       update   79.02       13.90       222.45

openssl gcm_aes256      encrypt   21.06       52.17       834.70
openssl gcm_aes256      decrypt   21.34       51.49       823.82
openssl gcm_aes256       update   56.18       19.55       312.87

x86_64 Intel Skylake laptop gcc 10.2 fat as sanity check:
NETTLE_FAT_VERBOSE=1 LD_LIBRARY_PATH=../.lib ./nettle-benchmark -f 4.6e9
aes256
libnettle: fat library initialization.
libnettle: cpu features: vendor:intel,aesni
libnettle: using aes instructions.
libnettle: not using sha_ni instructions.
libnettle: intel SSE2 will be used for memxor.
sha1_compress: 209.50 cycles
salsa20_core: 205.70 cycles
sha3_permute: 918.50 cycles (38.27 / round)

         Algorithm         mode Mbyte/s cycles/byte cycles/block

            aes256  ECB encrypt 4856.60        0.90        14.45
            aes256  ECB decrypt 4800.03        0.91        14.62
            aes256  CBC encrypt  889.91        4.93        78.87
            aes256  CBC decrypt 4331.24        1.01        16.21
            aes256   (in-place) 3516.29        1.25        19.96
            aes256          CTR 3131.58        1.40        22.41
            aes256   (in-place) 2826.07        1.55        24.84

    openssl aes256  ECB encrypt 4840.40        0.91        14.50
    openssl aes256  ECB decrypt 4835.88        0.91        14.51

        gcm_aes256      encrypt  585.60        7.49       119.86
        gcm_aes256      decrypt  585.29        7.50       119.92
        gcm_aes256       update  697.69        6.29       100.60

openssl gcm_aes256      encrypt 4499.49        0.97        15.60
openssl gcm_aes256      decrypt 4498.84        0.98        15.60
openssl gcm_aes256       update 11383.81        0.39         6.17

Just out of curiosity: I assume there's no aesni-pmull-like GCM
implementation for x86_64?
-- 
Thanks,
Michael
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to