Is it better to create a PR against https://github.com/apache/parquet-format so it can become the single source of truth of the Parquet-ALP spec?
On Wed, Jan 14, 2026 at 9:34 AM Julien Le Dem <[email protected]> wrote: > Thank you Micah for the detailed review! > Who else needs to do a round of reviews on the spec before we can finalize > it? > > > On Tue, Jan 13, 2026 at 10:07 AM PRATEEK GAUR <[email protected]> wrote: > > > Thanks Micah for a round of feedback. > > > > Here is a link to the spec document : > > > https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit > > > > On Tue, Nov 25, 2025 at 8:57 AM PRATEEK GAUR <[email protected]> wrote: > > > > > On Sat, Nov 22, 2025 at 4:49 AM Steve Loughran <[email protected]> > > > wrote: > > > > > >> First, sorry: I think I accidentally marked as done the comment in the > > >> doc about x86 performance. > > >> > > > > > > No worries, I restored the thread :). > > > > > > Those x86 numbers are critical, especially AVX512 in a recent intel > part. > > >> There's a notorious feature in the early ones where the cores would > > reduce > > >> frequency after you used the opcodes as a way of managing die > > temperature ( > > >> > > > https://stackoverflow.com/questions/56852812/simd-instructions-lowering-cpu-frequency > > >> ); the later ones and AMD models are the ones to worry about. > > >> > > > > > > We did collect performance numbers in our early prototype and they > looked > > > good on x86 hardware. Though I didn't check the processor family. > > > In our arrow implementation we are also working on a comprehensive > > > benchmarking script which will help everyone run it on different CPU > > > families to get a good idea of performance. > > > > > > Best > > > Prateek > > > > > > > > >> On Sat, 22 Nov 2025 at 04:15, Prateek Gaur via dev < > > >> [email protected]> wrote: > > >> > > >>> Hi team, > > >>> > > >>> *ALP ---> ALP PeudoDecimal* > > >>> > > >>> As is visible from the numbers above and as stated in the paper too > for > > >>> real double values, i.e the values with high precision points, it is > > very > > >>> difficult to get a good compression ratio. > > >>> > > >>> This combined with the fact that we want to keep the > > spec/implementation > > >>> simpler, stating Antoine directly here > > >>> > > >>> `*2. Do not include the ALPrd fallback which is a homegrown > dictionary* > > >>> > > >>> *encoding without dictionary reuse accross pages, and instead rely on > > >>> awell-known Parquet encoding (such as BYTE_STREAM_SPLIT?)*` > > >>> > > >>> Also based on some discussion I had with Julien in person and the > > >>> biweekly > > >>> meeting with a number of you. > > >>> > > >>> We'll be going with ALPpd (pseudo decimal) as the first > > >>> implementation relying on the query engine based on its own > heuristics > > to > > >>> decide on the right fallback to BYTE_STREAM_SPLIT of ZSTD. > > >>> > > >>> Best > > >>> Prateek > > >>> > > >>> > > >>> > > >>> On Thu, Nov 20, 2025 at 5:09 PM Prateek Gaur < > > [email protected] > > >>> > > > >>> wrote: > > >>> > > >>> > Sheet with numbers > > >>> > < > > >>> > > > https://docs.google.com/spreadsheets/d/1NmCg0WZKeZUc6vNXXD8M3GIyNqF_H3goj6mVbT8at7A/edit?gid=1351944517#gid=1351944517 > > >>> > > > >>> > . > > >>> > > > >>> > On Thu, Nov 20, 2025 at 5:09 PM PRATEEK GAUR <[email protected]> > > >>> wrote: > > >>> > > > >>> >> Hi team, > > >>> >> > > >>> >> There was a request from a few folks, Antoine Pitrou and Adam > Reeve > > >>> if I > > >>> >> remember correctly, to perform the experiment on some of the > papers > > >>> that > > >>> >> talked about BYTE_STREAM_SPLIT for completeness. > > >>> >> I wanted to share the numbers for the same in this sheet. At this > > >>> point > > >>> >> we have numbers on a wide variety of data. > > >>> >> (Will have to share the sheet from my snowflake account as our > > laptops > > >>> >> have fair bit of restriction with respect to copy paste > permissions > > >>> :) ) > > >>> >> > > >>> >> Best > > >>> >> Prateek > > >>> >> > > >>> >> On Thu, Nov 20, 2025 at 2:25 PM PRATEEK GAUR <[email protected]> > > >>> wrote: > > >>> >> > > >>> >>> Hi Julien, > > >>> >>> > > >>> >>> Yes based on > > >>> >>> > > >>> >>> - Numbers presented > > >>> >>> - Discussions over the doc and > > >>> >>> - Multiple discussions in the biweekly meeting > > >>> >>> > > >>> >>> We are in a stage where we agree this is the right encoding to > add > > >>> and > > >>> >>> we can move to the DRAFT/POC stage from DISCUSS stage. > > >>> >>> Will start working on the PR for the same. > > >>> >>> > > >>> >>> Thanks for bringing this up. > > >>> >>> Prateek > > >>> >>> > > >>> >>> On Thu, Nov 20, 2025 at 8:16 AM Julien Le Dem <[email protected] > > > > >>> wrote: > > >>> >>> > > >>> >>>> @PRATEEK GAUR <[email protected]> : Would you agree that we > are > > >>> past > > >>> >>>> the DISCUSS step and into the DRAFT/POC phase according to the > > >>> proposals > > >>> >>>> process < > > >>> https://github.com/apache/parquet-format/tree/master/proposals > > >>> >>>> >? > > >>> >>>> If yes, could you open a PR on this page to add this proposal to > > the > > >>> >>>> list? > > >>> >>>> https://github.com/apache/parquet-format/tree/master/proposals > > >>> >>>> Thank you! > > >>> >>>> > > >>> >>>> > > >>> >>>> On Thu, Oct 30, 2025 at 2:38 PM Andrew Lamb < > > [email protected] > > >>> > > > >>> >>>> wrote: > > >>> >>>> > > >>> >>>> > I have filed a ticket[1] in arrow-rs to track prototyping ALP > in > > >>> the > > >>> >>>> Rust > > >>> >>>> > Parquet reader if anyone is interested > > >>> >>>> > > > >>> >>>> > Andrew > > >>> >>>> > > > >>> >>>> > [1]: https://github.com/apache/arrow-rs/issues/8748 > > >>> >>>> > > > >>> >>>> > On Wed, Oct 22, 2025 at 1:33 PM Micah Kornfield < > > >>> >>>> [email protected]> > > >>> >>>> > wrote: > > >>> >>>> > > > >>> >>>> > > > > > >>> >>>> > > > C++, Java and Rust support them for sure. I feel like we > > >>> should > > >>> >>>> > > > probably default to V2 at some point. > > >>> >>>> > > > > >>> >>>> > > > > >>> >>>> > > I seem to recall, some of the vectorized java readers > > (Iceberg, > > >>> >>>> Spark) > > >>> >>>> > > might not support V2 data pages (but I might be confusing > this > > >>> with > > >>> >>>> > > encodings). But this is only a vague recollection. > > >>> >>>> > > > > >>> >>>> > > > > >>> >>>> > > > > >>> >>>> > > On Wed, Oct 22, 2025 at 6:38 AM Andrew Lamb < > > >>> [email protected] > > >>> >>>> > > > >>> >>>> > > wrote: > > >>> >>>> > > > > >>> >>>> > > > > Someone has to add V2 data pages to > > >>> >>>> > > > > > > >>> >>>> > > > > > >>> >>>> > > > > > >>> >>>> > > > > >>> >>>> > > > >>> >>>> > > >>> > > > https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md > > >>> >>>> > > > > :) > > >>> >>>> > > > > > >>> >>>> > > > Your wish is my command: > > >>> >>>> > https://github.com/apache/parquet-site/pull/124 > > >>> >>>> > > > > > >>> >>>> > > > As the format grows in popularity and momentum builds to > > >>> evolve, > > >>> >>>> I feel > > >>> >>>> > > the > > >>> >>>> > > > content on the parquet.apache.org site could use > > refreshing / > > >>> >>>> > updating. > > >>> >>>> > > > So, while I had the site open, I made some other PRs to > > >>> scratch > > >>> >>>> various > > >>> >>>> > > > itches > > >>> >>>> > > > > > >>> >>>> > > > (I am absolutely 🎣 for someone to please review 🙏): > > >>> >>>> > > > > > >>> >>>> > > > 1. Add Variant/Geometry/Geography types to implementation > > >>> status > > >>> >>>> > matrix: > > >>> >>>> > > > https://github.com/apache/parquet-site/pull/123 > > >>> >>>> > > > 2. Improve introduction / overview, add more links to spec > > and > > >>> >>>> > > > implementation status: > > >>> >>>> https://github.com/apache/parquet-site/pull/125 > > >>> >>>> > > > > > >>> >>>> > > > > > >>> >>>> > > > Thanks, > > >>> >>>> > > > Andrew > > >>> >>>> > > > > > >>> >>>> > > > On Wed, Oct 22, 2025 at 4:09 AM Antoine Pitrou < > > >>> >>>> [email protected]> > > >>> >>>> > > wrote: > > >>> >>>> > > > > > >>> >>>> > > > > > > >>> >>>> > > > > Hi Julien, hi all, > > >>> >>>> > > > > > > >>> >>>> > > > > On Mon, 20 Oct 2025 15:14:58 -0700 > > >>> >>>> > > > > Julien Le Dem <[email protected]> wrote: > > >>> >>>> > > > > > > > >>> >>>> > > > > > Another question from me: > > >>> >>>> > > > > > > > >>> >>>> > > > > > Since the goal is to not use compression at all in > this > > >>> case > > >>> >>>> (no > > >>> >>>> > > ZSTD) > > >>> >>>> > > > > > I'm assuming we would be using either: > > >>> >>>> > > > > > - the Data Page V1 with UNCOMPRESSED in the > > >>> >>>> ColumnMetadata.column > > >>> >>>> > > > > > < > > >>> >>>> > > > > > > >>> >>>> > > > > > >>> >>>> > > > > >>> >>>> > > > >>> >>>> > > >>> > > > https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L887 > > >>> >>>> > > > > > > > >>> >>>> > > > > > field. > > >>> >>>> > > > > > - the Data Page V2 with false in the > > >>> >>>> DataPageHeaderV2.is_compressed > > >>> >>>> > > > > > < > > >>> >>>> > > > > > > >>> >>>> > > > > > >>> >>>> > > > > >>> >>>> > > > >>> >>>> > > >>> > > > https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L746 > > >>> >>>> > > > > > > > >>> >>>> > > > > > field > > >>> >>>> > > > > > The second helping decide if we can selectively > compress > > >>> some > > >>> >>>> pages > > >>> >>>> > > if > > >>> >>>> > > > > they > > >>> >>>> > > > > > are less compressed by the > > >>> >>>> > > > > > A few years ago there was a question on the support of > > the > > >>> >>>> > > DATA_PAGE_V2 > > >>> >>>> > > > > and > > >>> >>>> > > > > > I was curious to hear a refresh on how that's > generally > > >>> >>>> supported > > >>> >>>> > in > > >>> >>>> > > > > > Parquet implementations. The is_compressed field was > > >>> exactly > > >>> >>>> > intended > > >>> >>>> > > > to > > >>> >>>> > > > > > avoid block compression when the encoding itself is > good > > >>> >>>> enough. > > >>> >>>> > > > > > > >>> >>>> > > > > Someone has to add V2 data pages to > > >>> >>>> > > > > > > >>> >>>> > > > > > > >>> >>>> > > > > > >>> >>>> > > > > >>> >>>> > > > >>> >>>> > > >>> > > > https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md > > >>> >>>> > > > > :) > > >>> >>>> > > > > > > >>> >>>> > > > > C++, Java and Rust support them for sure. I feel like we > > >>> should > > >>> >>>> > > > > probably default to V2 at some point. > > >>> >>>> > > > > > > >>> >>>> > > > > Also see > > https://github.com/apache/parquet-java/issues/3344 > > >>> for > > >>> >>>> > Java. > > >>> >>>> > > > > > > >>> >>>> > > > > Regards > > >>> >>>> > > > > > > >>> >>>> > > > > Antoine. > > >>> >>>> > > > > > > >>> >>>> > > > > > > >>> >>>> > > > > > > > >>> >>>> > > > > > Julien > > >>> >>>> > > > > > > > >>> >>>> > > > > > On Mon, Oct 20, 2025 at 11:57 AM Andrew Lamb > > >>> >>>> > > > > <[email protected]> wrote: > > >>> >>>> > > > > > > > >>> >>>> > > > > > > Thanks again Prateek and co for pushing this along! > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > > 1. Design and write our own Parquet-ALP spec so > that > > >>> >>>> > > > implementations > > >>> >>>> > > > > > > > know exactly how to encode and represent data > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > 100% agree with this (similar to what was done for > > >>> >>>> > ParquetVariant) > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > > 2. I may be missing something, but the paper > doesn't > > >>> seem > > >>> >>>> to > > >>> >>>> > > > > mention > > >>> >>>> > > > > > > non-finite values (such as +/-Inf and NaNs). > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > I think they are handled via the "Exception" > > mechanism. > > >>> >>>> Vortex's > > >>> >>>> > > ALP > > >>> >>>> > > > > > > implementation (below) does appear to handle finite > > >>> >>>> numbers[2] > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > > 3. It seems there is a single implementation, > which > > is > > >>> >>>> the one > > >>> >>>> > > > > published > > >>> >>>> > > > > > > > together with the paper. It is not obvious that it > > >>> will be > > >>> >>>> > > > > > > > maintained in the future, and reusing it is > probably > > >>> not > > >>> >>>> an > > >>> >>>> > > option > > >>> >>>> > > > > for > > >>> >>>> > > > > > > > non-C++ Parquet implementations > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > My understanding from the call was that Prateek and > > team > > >>> >>>> > > > re-implemented > > >>> >>>> > > > > > > ALP (did not use the implementation from CWI[3]) > but > > >>> that > > >>> >>>> would > > >>> >>>> > be > > >>> >>>> > > > > good to > > >>> >>>> > > > > > > confirm. > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > There is also a Rust implementation of ALP[1] that > is > > >>> part > > >>> >>>> of the > > >>> >>>> > > > > Vortex > > >>> >>>> > > > > > > file format implementation. I have not reviewed it > to > > >>> see > > >>> >>>> if it > > >>> >>>> > > > > deviates > > >>> >>>> > > > > > > from the algorithm presented in the paper. > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > Andrew > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > [1]: > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > >>> >>>> > > > > > >>> >>>> > > > > >>> >>>> > > > >>> >>>> > > >>> > > > https://github.com/vortex-data/vortex/blob/534821969201b91985a8735b23fc0c415a425a56/encodings/alp/src/lib.rs > > >>> >>>> > > > > > > [2]: > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > >>> >>>> > > > > > >>> >>>> > > > > >>> >>>> > > > >>> >>>> > > >>> > > > https://github.com/vortex-data/vortex/blob/534821969201b91985a8735b23fc0c415a425a56/encodings/alp/src/alp/compress.rs#L266-L281 > > >>> >>>> > > > > > > [3]: https://github.com/cwida/ALP > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > On Mon, Oct 20, 2025 at 4:47 AM Antoine Pitrou > > >>> >>>> > > > > <[email protected]> > wrote: > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > Hello, > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > Thanks for doing this and I agree the numbers look > > >>> >>>> impressive. > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > I would ask if possible for more data points: > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > 1. More datasets: you could for example look at > the > > >>> >>>> datasets > > >>> >>>> > that > > >>> >>>> > > > > were > > >>> >>>> > > > > > > > used to originally evalute BYTE_STREAM_SPLIT (see > > >>> >>>> > > > > > > > > https://issues.apache.org/jira/browse/PARQUET-1622 > > >>> and > > >>> >>>> > > > specifically > > >>> >>>> > > > > > > > the Google Doc linked there) > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > 2. Comparison to BYTE_STREAM_SPLIT + LZ4 and > > >>> >>>> BYTE_STREAM_SPLIT > > >>> >>>> > + > > >>> >>>> > > > ZSTD > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > 3. Optionally, some perf numbers on x86 too, but I > > >>> expect > > >>> >>>> that > > >>> >>>> > > ALP > > >>> >>>> > > > > will > > >>> >>>> > > > > > > > remain very good there as well > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > I also have the following reservations towards > ALP: > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > 1. There is no published official spec AFAICT, > just > > a > > >>> >>>> research > > >>> >>>> > > > paper. > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > 2. I may be missing something, but the paper > doesn't > > >>> seem > > >>> >>>> to > > >>> >>>> > > > mention > > >>> >>>> > > > > > > > non-finite values (such as +/-Inf and NaNs). > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > 3. It seems there is a single implementation, > which > > is > > >>> >>>> the one > > >>> >>>> > > > > published > > >>> >>>> > > > > > > > together with the paper. It is not obvious that it > > >>> will be > > >>> >>>> > > > > > > > maintained in the future, and reusing it is > probably > > >>> not > > >>> >>>> an > > >>> >>>> > > option > > >>> >>>> > > > > for > > >>> >>>> > > > > > > > non-C++ Parquet implementations > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > 4. The encoding itself is complex, since it > > involves a > > >>> >>>> fallback > > >>> >>>> > > on > > >>> >>>> > > > > > > > another encoding if the primary encoding (which > > >>> >>>> constitutes the > > >>> >>>> > > > real > > >>> >>>> > > > > > > > innovation) doesn't work out on a piece of data. > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > Based on this, I would say that if we think ALP is > > >>> >>>> attractive > > >>> >>>> > for > > >>> >>>> > > > us, > > >>> >>>> > > > > > > > we may want to incorporate our own version of ALP > > >>> with the > > >>> >>>> > > > following > > >>> >>>> > > > > > > > changes: > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > 1. Design and write our own Parquet-ALP spec so > that > > >>> >>>> > > > implementations > > >>> >>>> > > > > > > > know exactly how to encode and represent data > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > 2. Do not include the ALPrd fallback which is a > > >>> homegrown > > >>> >>>> > > > dictionary > > >>> >>>> > > > > > > > encoding without dictionary reuse accross pages, > and > > >>> >>>> instead > > >>> >>>> > rely > > >>> >>>> > > > on > > >>> >>>> > > > > a > > >>> >>>> > > > > > > > well-known Parquet encoding (such as > > >>> BYTE_STREAM_SPLIT?) > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > 3. Replace the FOR encoding inside ALP, which aims > > at > > >>> >>>> > compressing > > >>> >>>> > > > > > > > integers efficiently, with our own > > DELTA_BINARY_PACKED > > >>> >>>> (which > > >>> >>>> > has > > >>> >>>> > > > the > > >>> >>>> > > > > > > > same qualities and is already available in Parquet > > >>> >>>> > > implementations) > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > Regards > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > Antoine. > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > On Thu, 16 Oct 2025 14:47:33 -0700 > > >>> >>>> > > > > > > > PRATEEK GAUR <[email protected]> wrote: > > >>> >>>> > > > > > > > > Hi team, > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > We spent some time evaluating ALP compression > and > > >>> >>>> > decompression > > >>> >>>> > > > > > > compared > > >>> >>>> > > > > > > > to > > >>> >>>> > > > > > > > > other encoding alternatives like CHIMP/GORILLA > and > > >>> >>>> > compression > > >>> >>>> > > > > > > techniques > > >>> >>>> > > > > > > > > like SNAPPY/LZ4/ZSTD. We presented these numbers > > to > > >>> the > > >>> >>>> > > community > > >>> >>>> > > > > > > members > > >>> >>>> > > > > > > > > on October 15th in the biweekly parquet meeting. > > ( I > > >>> >>>> can't > > >>> >>>> > seem > > >>> >>>> > > > > to > > >>> >>>> > > > > > > access > > >>> >>>> > > > > > > > > the recording, so please let me know what access > > >>> rules > > >>> >>>> I need > > >>> >>>> > > to > > >>> >>>> > > > > get to > > >>> >>>> > > > > > > > be > > >>> >>>> > > > > > > > > able to view it ) > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > We did this evaluation over some datasets > pointed > > by > > >>> >>>> the ALP > > >>> >>>> > > > paper > > >>> >>>> > > > > and > > >>> >>>> > > > > > > > some > > >>> >>>> > > > > > > > > pointed by the parquet community. > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > The results are available in the following > > document > > >>> >>>> > > > > > > > > < > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > >>> >>>> > > > > > >>> >>>> > > > > >>> >>>> > > > >>> >>>> > > >>> > > > https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg/edit?tab=t.0 > > >>> >>>> > > > > > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > : > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > >>> >>>> > > > > > >>> >>>> > > > > >>> >>>> > > > >>> >>>> > > >>> > > > https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg > > >>> >>>> > > > > > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > Based on the numbers we see > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > - ALP is comparable to ZSTD(level=1) in > terms > > of > > >>> >>>> > > compression > > >>> >>>> > > > > ratio > > >>> >>>> > > > > > > > and > > >>> >>>> > > > > > > > > much better compared to other schemes. > (numbers > > >>> in > > >>> >>>> the > > >>> >>>> > sheet > > >>> >>>> > > > > are > > >>> >>>> > > > > > > bytes > > >>> >>>> > > > > > > > > needed to encode each value ) > > >>> >>>> > > > > > > > > - ALP going quite well in terms of > > decompression > > >>> >>>> speed > > >>> >>>> > > > (numbers > > >>> >>>> > > > > in > > >>> >>>> > > > > > > the > > >>> >>>> > > > > > > > > sheet are bytes decompressed per second) > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > As next steps we will > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > - Get the numbers for compression on top of > > byte > > >>> >>>> stream > > >>> >>>> > > split. > > >>> >>>> > > > > > > > > - Evaluate the algorithm over a few more > > >>> datasets. > > >>> >>>> > > > > > > > > - Have an implementation in the arrow-parquet > > >>> repo. > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > Looking forward to feedback from the community. > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > Best > > >>> >>>> > > > > > > > > Prateek and Dhirhan > > >>> >>>> > > > > > > > > > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > > > >>> >>>> > > > > > > > > >>> >>>> > > > > > > > >>> >>>> > > > > > > >>> >>>> > > > > > > >>> >>>> > > > > > > >>> >>>> > > > > > > >>> >>>> > > > > > >>> >>>> > > > > >>> >>>> > > > >>> >>>> > > >>> >>> > > >>> > > >> > > >
