On Wed, 10 Jun 2020 at 07:06, Neale Ferguson <ne...@sinenomine.net> wrote:
> Also, take a look at my post from last week. I used /dev/urandom as the > "real" test case and /dev/zero as "best case". > Ah, but random numbers are not cheap either. I use two large files on the CMS S-disk as my input and repeat them with "instore | dup * | outstore" to provide an infinite stream of bytes. One is 7M of readable text, the other is 2 MB of load module. Such a mix is not unlikely when you're compressing archives, for example. As expected, the throughput of the instruction depends heavily on the block size. When you invoke the DFLTCC on small blocks, the overhead of parameter block setup and millicode entry and exit reduces throughput. In the extreme case you'd be quicker doing it yourself in software. My understanding from the Linux zlib patch is that the blocks were apparently rather small, and possibly the code still does some double blocking that impacts throughput. The compression ratio mildly depends on the block size, but in general the hardware delivers a ratio comparable to the cheapest mode of zlib. An experimental CMS Pipelines stage showed with decent block size the following (which includes the cost of the test scaffolding to compress, expand and move the data through the digest stage twice to verify integrity). So the throughput number should be taken as a relative comparison to see order of magnitude. 1 GB Mode Ratio Transfer 18:26:27 copy 100.00% 716.1 MB/s 18:26:33 pack 60.39% 193.2 MB/s 18:27:04 terse 31.13% 33.2 MB/s 18:27:34 fplunzcp 30.65% 33.4 MB/s 18:27:36 zedccp 29.33% 747.4 MB/s As you see compression is around "terse" level and runs some 20 times quicker. The "fplunzcp" is an implementation of the UNIX compress algorithm. It's interesting to see that going through the DFLTCC instruction is about as fast as just copying the data without compression (which takes more data to move through part of the pipeline). Remember that even a program that "does nothing but compression" will still spend a lot of time getting the data through the CPU and processing the data in general. So even when the cost of compression drops to zero, it only eliminates a part of the total application cost. Many time-critical applications on Linux deal with expanding previously compressed data (like loading Java classes during WAS startup) where the benefit of hardware support is far less important. Sir Rob the Plumber ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www2.marist.edu/htbin/wlvindex?LINUX-390