Hi team, thanks to the very valuable reviews and feedback from Juliean, Micah, Adnrew and others, the FSST proposal is in the PoC stage, and will be worked upon in the coming weeks.
I just wanted to start a discussion on Composite encodings for Parquet and get the community's thoughts, feedback and suggestions on nested encodings. Nested/Composite/Hierarchical encodings are supported in Vortex, Fastlanes etc, and partly supported in Parquet (with Dict + RLE). This proposal discusses formalizing the same and paving way for future encodings like Dict + FSST, Delta + RLE and others. Several benchmarks were run on some well recognized nested encodings, and show significant compression gains (order of 10x improvements) which are further detailed in the doc. Would love to get your thoughts and feedback! https://docs.google.com/document/d/1Yi5JwpKEsRFw7D8-iETguRDPtjlyiKITCguYUrrzEVY Regards, - Arnav
