simonvandel opened a new issue, #4102: URL: https://github.com/apache/arrow-rs/issues/4102
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I would like to evaluate whether using the `BYTE_STREAM_SPLIT` encoding helps a Float64 column compress better. But it seems like it is not supported yet: https://github.com/apache/arrow-rs/blob/93484a10d145617434432d610e241640a06b382f/parquet/src/encodings/encoding/mod.rs#L90 **Describe the solution you'd like** An implementation of the encoding. Even a naive, non-optimized version would resolve this issue. The implementation can be improved iteratively. **Describe alternatives you've considered** `PyArrow` seems to support it, but I would really like to stay within the Rust world. **Additional context** - Parquet format description here: https://github.com/apache/parquet-format/blob/master/Encodings.md#byte-stream-split-byte_stream_split--9 - The scalar impl in the C++ library is here: https://github.com/apache/arrow/blob/0bf777a5952be012e41f5b1ad443d4fec38e6f5a/cpp/src/arrow/util/byte_stream_split.h#L579-L602 . They also have SIMD variations, which will be more involved to port. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
