I agree with Ed on this, the first page in a repeated field in any row-group must start with 0 to be valid.
On Fri, May 10, 2024 at 4:46 PM Ed Seidl <[email protected]> wrote: > Fun stuff...have felt the pain ;-) > > Given that the glossary defines a row group as "[a] logical horizontal > partitioning of the data into rows", emphasis "rows" and not "records", > I think that pretty strongly implies that row groups, at least, must > start on a row boundary. > > I too would be in support of explicitly stating pages start with r==0 in > the page index or V2 page header cases to clear up any remaining > confusion. Although I find page-spanning rows to be a nuisance, I do see > the value in continuing to allow them as this can lead to more uniform > page sizes when nested schemas are involved. > > Cheers, > Ed > > On 5/10/24 3:45 PM, Jan Finis wrote: > > Interesting, thanks for the input so far. Since the spec doesn't say this > > exactly, let me spin this one step further: > > > > May *row groups* start with an R-Level > 0? Intuitively, I would say > "hell > > no", but there is nothing in the Parquet spec that would say that this is > > forbidden. > > > > Am Fr., 10. Mai 2024 um 12:21 Uhr schrieb Micah Kornfield < > > [email protected]>: > > > >
