ArnavBalyan opened a new pull request, #704:
URL: https://github.com/apache/arrow-go/pull/704
cc @julienledem @alamb
### Rationale for this change
- Implements ALP (Adaptive Lossless floating point) encoding for float and
double columns as per Prateek's spec. Related to
https://github.com/apache/parquet-format/pull/548.
- ALP converts floating point values to integers via decimal scaling, then
applies For and bit packing. Values that don't round trip exactly are stored as
exceptions.
- The encoder is incremental and flushes per vector as values arrive. The
decoder is lazy with vectors decoded on demand via offset array.
### What changes are included in this PR?
- Wired ALP implementation through encoder.go and decoder.go based on the
original
[spec](https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit).
- Registered a new encoding as Encodings.ALP.
- Added unit tests/cross compat tests
### Are these changes tested?
- Yes new unit tests were added, all of which pass in addition to the
existing tests.
- Cross compat tests were added, the external encoded file can be provided
as env var which triggers arrow-go ALP decoding.
### Are there any user-facing changes?
- Yes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]