I don't know anything about zstd (or competitive compression) for GPU, but doubt it works at the desired granularity. I think SpMV on late-gen CPUs can be accelerated by zstd column index compression, especially for semi-structured problems, but likely also for unstructured problems numbered by breadth-first search or similar. But we'd need to demo that use specifically.
"Smith, Barry F." <bsm...@mcs.anl.gov> writes: > CPU to GPU? Especially matrices? > >> On Jul 11, 2019, at 9:05 AM, Jed Brown via petsc-dev <petsc-dev@mcs.anl.gov> >> wrote: >> >> Zstd is a remarkably good compressor. I've experimented with it for >> compressing column indices for sparse matrices on structured grids and >> (after a simple transform: subtracting the row number) gotten >> decompression speed in the neighborhood of 10 GB/s (i.e., faster per >> core than DRAM). I've been meaning to follow up. The transformation >> described below (splitting the bytes) is yielding decompression speed >> around 1GB/s (in this link below), which isn't competitive for things >> like MatMult, but could be useful for things like trajectory >> checkpointing. >> >> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view >> >> >> From: "Radev, Martin" <martin.ra...@tum.de> >> Subject: Re: Adding a new encoding for FP data >> Date: July 11, 2019 at 4:55:03 AM CDT >> To: "d...@arrow.apache.org" <d...@arrow.apache.org> >> Cc: "Raoofy, Amir" <amir.rao...@tum.de>, "Karlstetter, Roman" >> <roman.karlstet...@tum.de> >> Reply-To: <d...@arrow.apache.org> >> >> >> Hello Liya Fan, >> >> >> this explains the technique but for a more complex case: >> >> https://fgiesen.wordpress.com/2011/01/24/x86-code-compression-in-kkrunchy/ >> >> For FP data, the approach which seemed to be the best is the following. >> >> Say we have a buffer of two 32-bit floating point values: >> >> buf = [af, bf] >> >> We interpret each FP value as a 32-bit uint and look at each individual >> byte. We have 8 bytes in total for this small input. >> >> buf = [af0, af1, af2, af3, bf0, bf1, bf2, bf3] >> >> Then we apply stream splitting and the new buffer becomes: >> >> newbuf = [af0, bf0, af1, bf1, af2, bf2, af3, bf3] >> >> We compress newbuf. >> >> Due to similarities the sign bits, mantissa bits and MSB exponent bits, we >> might have a lot more repetitions in data. For scientific data, the 2nd and >> 3rd byte for 32-bit data is probably largely noise. Thus in the original >> representation we would always have a few bytes of data which could appear >> somewhere else in the buffer and then a couple bytes of possible noise. In >> the new representation we have a long stream of data which could compress >> well and then a sequence of noise towards the end. >> >> This transformation improved compression ratio as can be seen in the report. >> >> It also improved speed for ZSTD. This could be because ZSTD makes a decision >> of how to compress the data - RLE, new huffman tree, huffman tree of the >> previous frame, raw representation. Each can potentially achieve a different >> compression ratio and compression/decompression speed. It turned out that >> when the transformation is applied, zstd would attempt to compress fewer >> frames and copy the other. This could lead to less attempts to build a >> huffman tree. It's hard to pin-point the exact reason. >> >> I did not try other lossless text compressors but I expect similar results. >> >> For code, I can polish my patches, create a Jira task and submit the patches >> for review. >> >> >> Regards, >> >> Martin >> >> >> ________________________________ >> From: Fan Liya <liya.fa...@gmail.com> >> Sent: Thursday, July 11, 2019 11:32:53 AM >> To: d...@arrow.apache.org >> Cc: Raoofy, Amir; Karlstetter, Roman >> Subject: Re: Adding a new encoding for FP data >> >> Hi Radev, >> >> Thanks for the information. It seems interesting. >> IMO, Arrow has much to do for data compression. However, it seems there are >> some differences for memory data compression and external storage data >> compression. >> >> Could you please provide some reference for stream splitting? >> >> Best, >> Liya Fan >> >> On Thu, Jul 11, 2019 at 5:15 PM Radev, Martin <martin.ra...@tum.de> wrote: >> >>> Hello people, >>> >>> >>> there has been discussion in the Apache Parquet mailing list on adding a >>> new encoder for FP data. >>> The reason for this is that the supported compressors by Apache Parquet >>> (zstd, gzip, etc) do not compress well raw FP data. >>> >>> >>> In my investigation it turns out that a very simple simple technique, >>> named stream splitting, can improve the compression ratio and even speed >>> for some of the compressors. >>> >>> You can read about the results here: >>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view >>> >>> >>> I went through the developer guide for Apache Arrow and wrote a patch to >>> add the new encoding and test coverage for it. >>> >>> I will polish my patch and work in parallel to extend the Apache Parquet >>> format for the new encoding. >>> >>> >>> If you have any concerns, please let me know. >>> >>> >>> Regards, >>> >>> Martin >>> >>> >> >>