alamb opened a new pull request, #49154: URL: https://github.com/apache/arrow/pull/49154
This builds on the following PR from @prtkgaur - https://github.com/apache/arrow/pull/48345 It contains a binary that creates files using the new ALP encoding here: - https://github.com/apache/parquet-format/pull/548 I don't intend to merge this PR, rather I plan to use it to create test parquet files, and am posting the PR in case anyone else is interested. To build ```shell cd arrow/cpp cmake -S . -B build -DARROW_PARQUET=ON -DPARQUET_BUILD_EXAMPLES=ON \ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 \ -DARROW_MIMALLOC=OFF -DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE MAKEFLAGS=-j8 cmake --build build --target parquet-write-parquet ``` To run ```shell cd arrow/cpp ./build/release/parquet-write-parquet --encoding ALP /tmp ``` This writes a file like this to /tmp: [single_f64_ALP.zip](https://github.com/user-attachments/files/25097841/single_f64_ALP.zip) TODO: make sure the following patterns,[ from the spec](https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit), are covered: 1. pages with no exceptions 2. encoding w/ exceptiosn and NAN, INF, etc 3. multiple ALP vector sizes (1 -> 15 == 65k) 4. Both f32 and f64 variants -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
