mapleFU commented on issue #15107:
URL: https://github.com/apache/arrow/issues/15107#issuecomment-1367326382

   > That would mean using upper bound buffer size for boolean and slicing off 
the unnecessary part after encoding? Sounds like a good idea. Is this what 
other implementations do?
   
   Maybe we're not able to guess a "upper bound buffer" on Encoder, I'd like to 
buffer values in `PlainEncoder<BooleanType>`, and when `flushValues`, acquire 
the buffer and put all of them into `RleEncoder`.
   
   The parquet-mr uses `CapacityByteArrayOutputStream` in 
`RunLengthBitPackingHybridEncoder`, which is able to grow the buffer size.
   
   I didn't find other implementions, seems that maybe people likes PLAIN 
Encoding? In Rust, parquet2 is not hybrid, seems it just implement bit-packing 
when encoding. Arrow-rs just uses a `BitWriter`, which is able to resize.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to