Fun stuff...have felt the pain ;-)

Given that the glossary defines a row group as "[a] logical horizontal partitioning of the data into rows", emphasis "rows" and not "records", I think that pretty strongly implies that row groups, at least, must start on a row boundary.

I too would be in support of explicitly stating pages start with r==0 in the page index or V2 page header cases to clear up any remaining confusion. Although I find page-spanning rows to be a nuisance, I do see the value in continuing to allow them as this can lead to more uniform page sizes when nested schemas are involved.

Cheers,
Ed

On 5/10/24 3:45 PM, Jan Finis wrote:
Interesting, thanks for the input so far. Since the spec doesn't say this
exactly, let me spin this one step further:

May *row groups* start with an R-Level > 0? Intuitively, I would say "hell
no", but there is nothing in the Parquet spec that would say that this is
forbidden.

Am Fr., 10. Mai 2024 um 12:21 Uhr schrieb Micah Kornfield <
emkornfi...@gmail.com>:


Reply via email to