CPU to GPU? Especially matrices?
> On Jul 11, 2019, at 9:05 AM, Jed Brown via petsc-dev <petsc-dev@mcs.anl.gov> > wrote: > > Zstd is a remarkably good compressor. I've experimented with it for > compressing column indices for sparse matrices on structured grids and > (after a simple transform: subtracting the row number) gotten > decompression speed in the neighborhood of 10 GB/s (i.e., faster per > core than DRAM). I've been meaning to follow up. The transformation > described below (splitting the bytes) is yielding decompression speed > around 1GB/s (in this link below), which isn't competitive for things > like MatMult, but could be useful for things like trajectory > checkpointing. > > https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view > > > From: "Radev, Martin" <martin.ra...@tum.de> > Subject: Re: Adding a new encoding for FP data > Date: July 11, 2019 at 4:55:03 AM CDT > To: "d...@arrow.apache.org" <d...@arrow.apache.org> > Cc: "Raoofy, Amir" <amir.rao...@tum.de>, "Karlstetter, Roman" > <roman.karlstet...@tum.de> > Reply-To: <d...@arrow.apache.org> > > > Hello Liya Fan, > > > this explains the technique but for a more complex case: > > https://fgiesen.wordpress.com/2011/01/24/x86-code-compression-in-kkrunchy/ > > For FP data, the approach which seemed to be the best is the following. > > Say we have a buffer of two 32-bit floating point values: > > buf = [af, bf] > > We interpret each FP value as a 32-bit uint and look at each individual byte. > We have 8 bytes in total for this small input. > > buf = [af0, af1, af2, af3, bf0, bf1, bf2, bf3] > > Then we apply stream splitting and the new buffer becomes: > > newbuf = [af0, bf0, af1, bf1, af2, bf2, af3, bf3] > > We compress newbuf. > > Due to similarities the sign bits, mantissa bits and MSB exponent bits, we > might have a lot more repetitions in data. For scientific data, the 2nd and > 3rd byte for 32-bit data is probably largely noise. Thus in the original > representation we would always have a few bytes of data which could appear > somewhere else in the buffer and then a couple bytes of possible noise. In > the new representation we have a long stream of data which could compress > well and then a sequence of noise towards the end. > > This transformation improved compression ratio as can be seen in the report. > > It also improved speed for ZSTD. This could be because ZSTD makes a decision > of how to compress the data - RLE, new huffman tree, huffman tree of the > previous frame, raw representation. Each can potentially achieve a different > compression ratio and compression/decompression speed. It turned out that > when the transformation is applied, zstd would attempt to compress fewer > frames and copy the other. This could lead to less attempts to build a > huffman tree. It's hard to pin-point the exact reason. > > I did not try other lossless text compressors but I expect similar results. > > For code, I can polish my patches, create a Jira task and submit the patches > for review. > > > Regards, > > Martin > > > ________________________________ > From: Fan Liya <liya.fa...@gmail.com> > Sent: Thursday, July 11, 2019 11:32:53 AM > To: d...@arrow.apache.org > Cc: Raoofy, Amir; Karlstetter, Roman > Subject: Re: Adding a new encoding for FP data > > Hi Radev, > > Thanks for the information. It seems interesting. > IMO, Arrow has much to do for data compression. However, it seems there are > some differences for memory data compression and external storage data > compression. > > Could you please provide some reference for stream splitting? > > Best, > Liya Fan > > On Thu, Jul 11, 2019 at 5:15 PM Radev, Martin <martin.ra...@tum.de> wrote: > >> Hello people, >> >> >> there has been discussion in the Apache Parquet mailing list on adding a >> new encoder for FP data. >> The reason for this is that the supported compressors by Apache Parquet >> (zstd, gzip, etc) do not compress well raw FP data. >> >> >> In my investigation it turns out that a very simple simple technique, >> named stream splitting, can improve the compression ratio and even speed >> for some of the compressors. >> >> You can read about the results here: >> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view >> >> >> I went through the developer guide for Apache Arrow and wrote a patch to >> add the new encoding and test coverage for it. >> >> I will polish my patch and work in parallel to extend the Apache Parquet >> format for the new encoding. >> >> >> If you have any concerns, please let me know. >> >> >> Regards, >> >> Martin >> >> > >