On Wed, 10 Jun 2020 at 07:06, Neale Ferguson <ne...@sinenomine.net> wrote:

> Also, take a look at my post from last week. I used /dev/urandom as the
> "real" test case and /dev/zero as "best case".
>

Ah, but random numbers are not cheap either. I use two large files on the
CMS S-disk as my input and repeat them with "instore | dup * | outstore" to
provide an infinite stream of bytes. One is 7M of readable text, the other
is 2 MB of load module. Such a mix is not unlikely when you're compressing
archives, for example.

As expected, the throughput of the instruction depends heavily on the block
size. When you invoke the DFLTCC on small blocks, the overhead of parameter
block setup and millicode entry and exit reduces throughput. In the extreme
case you'd be quicker doing it yourself in software. My understanding from
the Linux zlib patch is that the blocks were apparently rather small, and
possibly the code still does some double blocking that impacts throughput.
The compression ratio mildly depends on the block size, but in general the
hardware delivers a ratio comparable to the cheapest mode of zlib.

An experimental CMS Pipelines stage showed with decent block size the
following (which includes the cost of the test scaffolding to compress,
expand and move the data through the digest stage twice to verify
integrity). So the throughput number should be taken as a relative
comparison to see order of magnitude.

    1 GB Mode                     Ratio        Transfer
18:26:27 copy                   100.00%      716.1 MB/s
18:26:33 pack                    60.39%      193.2 MB/s
18:27:04 terse                   31.13%       33.2 MB/s
18:27:34 fplunzcp                30.65%       33.4 MB/s
18:27:36 zedccp                  29.33%      747.4 MB/s

As you see compression is around "terse" level and runs some 20 times
quicker. The "fplunzcp" is an implementation of the UNIX compress
algorithm. It's interesting to see that going through the
DFLTCC instruction is about as fast as just copying the data without
compression (which takes more data to move through part of the pipeline).

Remember that even a program that "does nothing but compression" will still
spend a lot of time getting the data through the CPU and processing the
data in general. So even when the cost of compression drops to zero, it
only eliminates a part of the total application cost. Many time-critical
applications on Linux deal with expanding previously compressed data (like
loading Java classes during WAS startup) where the benefit of hardware
support is far less important.

Sir Rob the Plumber

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www2.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to