mapleFU commented on issue #15107: URL: https://github.com/apache/arrow/issues/15107#issuecomment-1367326382
> That would mean using upper bound buffer size for boolean and slicing off the unnecessary part after encoding? Sounds like a good idea. Is this what other implementations do? Maybe we're not able to guess a "upper bound buffer" on Encoder, I'd like to buffer values in `PlainEncoder<BooleanType>`, and when `flushValues`, acquire the buffer and put all of them into `RleEncoder`. The parquet-mr uses `CapacityByteArrayOutputStream` in `RunLengthBitPackingHybridEncoder`, which is able to grow the buffer size. I didn't find other implementions, seems that maybe people likes PLAIN Encoding? In Rust, parquet2 is not hybrid, seems it just implement bit-packing when encoding. Arrow-rs just uses a `BitWriter`, which is able to resize. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
