On Thu, Oct 28, 2021 at 04:06:42PM -0600, Shawn Heisey wrote:
> The file I transferred is 4GB in size, copied from /dev/urandom with dd. 
> Did the pull from another machine on the same gigabit LAN.  I picked the
> cipher by watching for TLS 1.2 ciphers shown by testssl.sh and choosing one
> that mentioned AES.  The server has plenty of memory to cache that entire
> 4GB file, so disk speed should be irrelevant.
> 
> Thank you for hanging onto enough patience to help me navigate this rabbit
> hole.

By the way on this subject, based on the numbers you reported for
openssl speed, the speed differences on as low bandwidth a network as
1 Gbps are not even relevant. Your machine can encrypt/decrypt at
roughly 2 Gbps per core even when not using AES-NI, so in this case
it's more important to watch the CPU utilization during the transfer
than the transfer speed itself, which can be affected by many other
factors.

Also since you performed your transfer using aes-256-gcm, that's the
one you should test. For me the differences are huge with and without
AES-NI on this algo:

without:
  $ OPENSSL_ia32cap="+0x200000200000000" openssl speed -elapsed -evp aes-256-gcm
  type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes 
 16384 bytes
  aes-256-gcm     128654.97k   155005.99k   162485.42k   165428.22k   
166909.27k   166914.73k

with:
  $ openssl speed -elapsed -evp aes-256-gcm
  type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes 
 16384 bytes
  aes-256-gcm     547722.32k  1457707.18k  2632156.25k  3890468.52k  
4604226.22k  4597268.48k

It's almost 30 times faster on large blocks. At 1 Gbps (~118 MB/s), this
machine would spend roughly 70% of CPU in the AES code without AES-NI
versus 2.5% with it. That's where you can see a really measurable
difference.

Of course, like Lukas said, "perf" is very useful here to see where time is
spent.

One trick I often use to measure the effects of micro-optimizations or things
like this that only bring a benefit at higher data rate, is to chain many
haproxy instances so that the traffic is processed many times. I have some
config with 100 instances for example. When your traffic passes 100 times
through decryption/encryption, you can hope to start to measure a big
difference :-)

Willy

Reply via email to