etseidl commented on PR #9374: URL: https://github.com/apache/arrow-rs/pull/9374#issuecomment-3892586650
Hmm, I had assumed `at_record_boundary` was more sophisticated. Looking at past discussions around this issue (#4943), I don't know what the correct fix is, beyond simply removing the short cut altogether. The issue with relying on V2 page behavior is that files created by earlier version of this crate (and possibly other impls), did not always obey the pages-start-at-a-record-boundary rule. The only way I can think of to really know if the `has_partial` flag can be cleared is to decode the first repetition level for the next page and see if it's `0`. Beyond that there's really no way to know. I think maybe we should only trust `num_rows` if there are no repetition levels (i.e. no lists present), in which case we know for sure there are no page-spanning rows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
