simonvandel opened a new issue, #4102:
URL: https://github.com/apache/arrow-rs/issues/4102

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   I would like to evaluate whether using the `BYTE_STREAM_SPLIT` encoding 
helps a Float64 column compress better. But it seems like it is not supported 
yet: 
https://github.com/apache/arrow-rs/blob/93484a10d145617434432d610e241640a06b382f/parquet/src/encodings/encoding/mod.rs#L90
   
   **Describe the solution you'd like**
   An implementation of the encoding. Even a naive, non-optimized version would 
resolve this issue. The implementation can be improved iteratively.
   
   **Describe alternatives you've considered**
   `PyArrow` seems to support it, but I would really like to stay within the 
Rust world.
   
   **Additional context**
   - Parquet format description here: 
https://github.com/apache/parquet-format/blob/master/Encodings.md#byte-stream-split-byte_stream_split--9
   - The scalar impl in the C++ library is here: 
https://github.com/apache/arrow/blob/0bf777a5952be012e41f5b1ad443d4fec38e6f5a/cpp/src/arrow/util/byte_stream_split.h#L579-L602
 . They also have SIMD variations, which will be more involved to port.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to