Hello people,
there has been discussion in the Apache Parquet mailing list on adding a new encoder for FP data. The reason for this is that the supported compressors by Apache Parquet (zstd, gzip, etc) do not compress well raw FP data. In my investigation it turns out that a very simple simple technique, named stream splitting, can improve the compression ratio and even speed for some of the compressors. You can read about the results here: https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view I went through the developer guide for Apache Arrow and wrote a patch to add the new encoding and test coverage for it. I will polish my patch and work in parallel to extend the Apache Parquet format for the new encoding. If you have any concerns, please let me know. Regards, Martin