Hi Radev,

Thanks for the information. It seems interesting.
IMO, Arrow has much to do for data compression. However, it seems there are
some differences for memory data compression and external storage data
compression.

Could you please provide some reference for stream splitting?

Best,
Liya Fan

On Thu, Jul 11, 2019 at 5:15 PM Radev, Martin <martin.ra...@tum.de> wrote:

> Hello people,
>
>
> there has been discussion in the Apache Parquet mailing list on adding a
> new encoder for FP data.
> The reason for this is that the supported compressors by Apache Parquet
> (zstd, gzip, etc) do not compress well raw FP data.
>
>
> In my investigation it turns out that a very simple simple technique,
> named stream splitting, can improve the compression ratio and even speed
> for some of the compressors.
>
> You can read about the results here:
> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>
>
> I went through the developer guide for Apache Arrow and wrote a patch to
> add the new encoding and test coverage for it.
>
> I will polish my patch and work in parallel to extend the Apache Parquet
> format for the new encoding.
>
>
> If you have any concerns, please let me know.
>
>
> Regards,
>
> Martin
>
>

Reply via email to