etseidl commented on PR #9374:
URL: https://github.com/apache/arrow-rs/pull/9374#issuecomment-3892586650

   Hmm, I had assumed `at_record_boundary` was more sophisticated. Looking at 
past discussions around this issue (#4943), I don't know what the correct fix 
is, beyond simply removing the short cut altogether. The issue with relying on 
V2 page behavior is that files created by earlier version of this crate (and 
possibly other impls), did not always obey the pages-start-at-a-record-boundary 
rule. The only way I can think of to really know if the `has_partial` flag can 
be cleared is to decode the first repetition level for the next page and see if 
it's `0`. Beyond that there's really no way to know.
   
   I think maybe we should only trust `num_rows` if there are no repetition 
levels (i.e. no lists present), in which case we know for sure there are no 
page-spanning rows. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to