Not on the list or I would've replied directly, but on Haswell, ChaCha20 (in
software) is over 2x as fast as AES (in hardware), at realistic (for a
filesystem) block sizes:
 
testing speed of ctr(aes) (ctr(aes-aesni)) decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 378 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 1130 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 3981 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 15458 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 122880 cycles (8192 
bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 391 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 1193 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 4212 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 16388 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 131029 cycles (8192 
bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 417 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 1222 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 4398 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 17114 cycles (1024 
bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 137028 cycles (8192 
bytes)

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 1 operation in 4356 cycles (16 bytes)
test 1 (256 bit key, 64 byte blocks): 1 operation in 4004 cycles (64 bytes)
test 2 (256 bit key, 256 byte blocks): 1 operation in 6524 cycles (256 bytes)
test 3 (256 bit key, 1024 byte blocks): 1 operation in 9248 cycles (1024 bytes)
test 4 (256 bit key, 8192 byte blocks): 1 operation in 60274 cycles (8192 bytes)

Poly1305 is also plenty fast:

testing speed of gcm(aes) (gcm_base(ctr-aes-aesni,ghash-generic)) encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 7567 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 9654 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 19010 cycles (256 bytes)
test 3 (128 bit key, 512 byte blocks): 1 operation in 33118 cycles (512 bytes)
test 4 (128 bit key, 1024 byte blocks): 1 operation in 59738 cycles (1024 bytes)
test 5 (128 bit key, 2048 byte blocks): 1 operation in 106545 cycles (2048 
bytes)
test 6 (128 bit key, 4096 byte blocks): 1 operation in 211189 cycles (4096 
bytes)
test 7 (128 bit key, 8192 byte blocks): 1 operation in 370439 cycles (8192 
bytes)
test 8 (192 bit key, 16 byte blocks): 1 operation in 6780 cycles (16 bytes)
test 9 (192 bit key, 64 byte blocks): 1 operation in 8802 cycles (64 bytes)
test 10 (192 bit key, 256 byte blocks): 1 operation in 17352 cycles (256 bytes)
test 11 (192 bit key, 512 byte blocks): 1 operation in 28680 cycles (512 bytes)
test 12 (192 bit key, 1024 byte blocks): 1 operation in 51230 cycles (1024 
bytes)
test 13 (192 bit key, 2048 byte blocks): 1 operation in 96662 cycles (2048 
bytes)
test 14 (192 bit key, 4096 byte blocks): 1 operation in 187287 cycles (4096 
bytes)
test 15 (192 bit key, 8192 byte blocks): 1 operation in 372570 cycles (8192 
bytes)
test 16 (256 bit key, 16 byte blocks): 1 operation in 6273 cycles (16 bytes)
test 17 (256 bit key, 64 byte blocks): 1 operation in 8096 cycles (64 bytes)
test 18 (256 bit key, 256 byte blocks): 1 operation in 15895 cycles (256 bytes)
test 19 (256 bit key, 512 byte blocks): 1 operation in 26259 cycles (512 bytes)
test 20 (256 bit key, 1024 byte blocks): 1 operation in 47121 cycles (1024 
bytes)
test 21 (256 bit key, 2048 byte blocks): 1 operation in 91003 cycles (2048 
bytes)
test 22 (256 bit key, 4096 byte blocks): 1 operation in 175883 cycles (4096 
bytes)
test 23 (256 bit key, 8192 byte blocks): 1 operation in 340904 cycles (8192 
bytes)

testing speed of rfc7539esp(chacha20,poly1305) 
(rfc7539esp(chacha20-simd,poly1305-simd)) encryption
test 0 (288 bit key, 16 byte blocks): 1 operation in 12145 cycles (16 bytes)
test 1 (288 bit key, 64 byte blocks): 1 operation in 14538 cycles (64 bytes)
test 2 (288 bit key, 256 byte blocks): 1 operation in 16435 cycles (256 bytes)
test 3 (288 bit key, 512 byte blocks): 1 operation in 15622 cycles (512 bytes)
test 4 (288 bit key, 1024 byte blocks): 1 operation in 18671 cycles (1024 bytes)
test 5 (288 bit key, 2048 byte blocks): 1 operation in 23264 cycles (2048 bytes)
test 6 (288 bit key, 4096 byte blocks): 1 operation in 36480 cycles (4096 bytes)
test 7 (288 bit key, 8192 byte blocks): 1 operation in 75051 cycles (8192 bytes)

When AVX-512 comes out ChaCha20 is going to get even faster - probably by more
than 2x, since they're adding a rotate instruction. I haven't tested on ARM but
I'd be surprised if the situation is significantly different there (the kernel's
lacking a NEON ChaCha20 implementation, but I could do one).

Just because it's implemented in hardware doesn't mean it's faster...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to