zeroshade commented on PR #13277:
URL: https://github.com/apache/arrow/pull/13277#issuecomment-1147662934

   @mdepero In the `parquet/internal/utils` package there is a function 
`BytesToBools` which explicitly is an efficient conversion from bitpacked bytes 
to a `[]bool`. It assumes that the slices are already sized appropriately 
(`len(out)` should equal `len(in)*8`).
   
   That being said, there might be a way we can get around this by changing the 
implementation slightly. 
   
   Since the `columnChunkReader` is embedded in the typed readers, we don't 
"technically" need to do `cr.columnChunkReader.skipValues(...` and could 
instead do `cr.skipValues(...`. I only specified the `cr.columnChunkReader` 
portion to make it explicit. The benefit there if we convert it to being just 
`cr.skipValues` is that we can then override the `skipValues` function for the 
`BooleanColumnChunkReader` to allocate the *correct* amount of scratch space, 
it does result in some duplication of code but I think it's a better solution 
to avoid the extra allocation where possible. As another, forward looking idea, 
I'd probably want to have the scratch space use a pool of buffers rather than 
allocating a new scratch space for every skip but that can be done as a later 
change. Anyways, did that all make sense as something you could do? 
   
   Let me know if you have any questions. Thanks again for this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to