Fun stuff...have felt the pain ;-)
Given that the glossary defines a row group as "[a] logical horizontal
partitioning of the data into rows", emphasis "rows" and not "records",
I think that pretty strongly implies that row groups, at least, must
start on a row boundary.
I too would be in support of explicitly stating pages start with r==0 in
the page index or V2 page header cases to clear up any remaining
confusion. Although I find page-spanning rows to be a nuisance, I do see
the value in continuing to allow them as this can lead to more uniform
page sizes when nested schemas are involved.
Cheers,
Ed
On 5/10/24 3:45 PM, Jan Finis wrote:
Interesting, thanks for the input so far. Since the spec doesn't say this
exactly, let me spin this one step further:
May *row groups* start with an R-Level > 0? Intuitively, I would say "hell
no", but there is nothing in the Parquet spec that would say that this is
forbidden.
Am Fr., 10. Mai 2024 um 12:21 Uhr schrieb Micah Kornfield <
[email protected]>: