Re: [DISCUSS] Future of Parquet Versioning

Daniel Weeks Thu, 04 Jun 2026 18:38:32 -0700

Doc is back up, sorry for the interruption.

-Dan


On Thu, Jun 4, 2026 at 3:59 PM Daniel Weeks <[email protected]> wrote:

> Sorry everyone,
>
> I created the document using a new account, and Google flagged it
> (probably because many external accounts accessed the Google Doc).
>
> I'm working to get it restored and if I can't, I'll post a new copy, but
> it won't include the original comments.
>
> -Dan
>
> On Thu, Jun 4, 2026 at 3:50 PM Ed Seidl <[email protected]> wrote:
>
>> On 2026/06/04 22:01:32 Andrew Bell wrote:
>> > How can a reader know that it has the tooling to read a file with this
>> > approach?
>>
>> At present there isn't an in-use mechanism beyond parsing the
>> "created_by" string.
>>
>> > What is the hesitation to change version numbers?
>>
>> Which version number? The version number in the FileMetaData would sort
>> of work,
>> except in the case of an incompatible change made to the metadata. We
>> could change
>> the file magic from PAR1 to something else, but that is not workable
>> beyond PAR9, say.
>> Also, the file magic really shouldn't change frequently as that breaks
>> tools like the unix
>> "file" command.
>>
>> One thought I had, that should not break any current readers, would be to
>> expand the header
>> from 4 to 8 bytes say. We could embed a version number in bytes 4-7.
>> Writing a decimal
>> 2026 perhaps (if we use calendar year only), or 202606. Or use SemVer,
>> one byte each for
>> major/minor/patch. Or make the header longer and embed a fixed-length,
>> space or null
>> padded string. This expanded header shouldn't break current readers since
>> the offset for
>> the first page should be obtained from the ColumnMetaData. If there are
>> readers that rely
>> on a page starting immediately after the 'PAR1', we could mandate that
>> the first byte
>> following PAR1 is 0. A thrift parser would see that as the end of the
>> PageHeader struct
>> and then likely fail on missing required fields.
>>
>> Ed
>>
>>

Re: [DISCUSS] Future of Parquet Versioning

Reply via email to