On Wed, 22 Feb 2017, Jacob Champion wrote:

To make results less confusing, any specific patches/branch I should
test? My baseline is httpd-2.4.25 + httpd-2.4.25-deps
--with-included-apr FWIW.

2.4.25 is just fine. We'll have to make sure there's nothing substantially different about it performance-wise before we backport patches anyway, so it'd be good to start testing it now.

OK.

- The OpenSSL test server, writing from memory: 1.2 GiB/s
- httpd trunk with `EnableMMAP on` and serving from disk: 850 MiB/s
- httpd trunk with 'EnableMMAP off': 580 MiB/s
- httpd trunk with my no-mmap-64K-block file bucket: 810 MiB/s

At those speeds your results might be skewed by the latency of
processing 10 MiB GET:s.

Maybe, but keep in mind I care more about the difference between the numbers than the absolute throughput ceiling here. (In any case, I don't see significantly different numbers between 10 MiB and 1 GiB files. Remember, I'm testing via loopback.)

Ah, right.

Discard the results from the first warm-up
access and your results delivering from memory or disk (cache) shouldn't
differ.

Ah, but they *do*, as Yann pointed out earlier. We can't just deliver the disk cache to OpenSSL for encryption; it has to be copied into some addressable buffer somewhere. That seems to be a major reason for the mmap() advantage, compared to a naive read() solution that just reads into a small buffer over and over again.

(I am trying to set up Valgrind to confirm where the test server is spending most of its time, but it doesn't care for the large in-memory static buffer, or for OpenSSL's compressed debugging symbols, and crashes. :( )

Any joy with something simpler like gprof? (Caveat: haven't used it in ages to I don't know if its even applicable nowadays).

Numbers on the "memcopy penalty" would indeed be interesting, especially any variation when the block size differs.

As I said, our live server does 600 MB/s aes-128-gcm and can deliver 300
MB/s https without mmap. That's only a factor 2 difference between
aes-128-gcm speed and delivered speed.

Your results above are almost a factor 4 off, so something's fishy :-)

Well, I can only report my methodology and numbers -- whether the numbers are actually meaningful has yet to be determined. ;D More testers are welcome!

:-)

I did some repeated tests and my initial results were actually a bit on the low side:

Server CPU is an Intel E5606 (1st gen aes offload), openssl speed -evp
says:

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-gcm     208536.05k   452980.05k   567523.33k   607578.11k   619192.32k

Single-stream https over a 10 Gbps link with 3ms RTT (useful routing SNAFU when talking to stuff in the neigboring building traffic takes the "shortcut" through a town 300 km away ;).

Using wget -O /dev/null as a client, on a host with Intel E5-2630 CPU (960-ish MB/s aes-128-gcm on 8k blocks).

http (sendfile): 1.07 GB/s (repeatedly)

httpd (no mmap): 370-380 MB/s

openssl s_server: 330-340 MB/s

So httpd isn't beat by the naive openssl s_server approach at least ;-)


Going off on a tangent here:

For those of you who actually know how the ssl stuff really works, is it possible to get multiple threads involved in doing the encryption, or do you need the results from the previous block in order to do the next one? Yes, I know this wouldn't make sense for most real setups but for a student computer club with old hardware and good connectivity this is a real problem ;-)

On the other hand, you would need it to do 100 Gbps single-stream https even on latest&greatest CPUs 8-)


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     ni...@acc.umu.se
---------------------------------------------------------------------------
 There may be a correlation between humor and sex. - Data
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Reply via email to