Sorry everyone, I created the document using a new account, and Google flagged it (probably because many external accounts accessed the Google Doc).
I'm working to get it restored and if I can't, I'll post a new copy, but it won't include the original comments. -Dan On Thu, Jun 4, 2026 at 3:50 PM Ed Seidl <[email protected]> wrote: > On 2026/06/04 22:01:32 Andrew Bell wrote: > > How can a reader know that it has the tooling to read a file with this > > approach? > > At present there isn't an in-use mechanism beyond parsing the "created_by" > string. > > > What is the hesitation to change version numbers? > > Which version number? The version number in the FileMetaData would sort of > work, > except in the case of an incompatible change made to the metadata. We > could change > the file magic from PAR1 to something else, but that is not workable > beyond PAR9, say. > Also, the file magic really shouldn't change frequently as that breaks > tools like the unix > "file" command. > > One thought I had, that should not break any current readers, would be to > expand the header > from 4 to 8 bytes say. We could embed a version number in bytes 4-7. > Writing a decimal > 2026 perhaps (if we use calendar year only), or 202606. Or use SemVer, one > byte each for > major/minor/patch. Or make the header longer and embed a fixed-length, > space or null > padded string. This expanded header shouldn't break current readers since > the offset for > the first page should be obtained from the ColumnMetaData. If there are > readers that rely > on a page starting immediately after the 'PAR1', we could mandate that the > first byte > following PAR1 is 0. A thrift parser would see that as the end of the > PageHeader struct > and then likely fail on missing required fields. > > Ed > >
