Doc is back up, sorry for the interruption. -Dan
On Thu, Jun 4, 2026 at 3:59 PM Daniel Weeks <[email protected]> wrote: > Sorry everyone, > > I created the document using a new account, and Google flagged it > (probably because many external accounts accessed the Google Doc). > > I'm working to get it restored and if I can't, I'll post a new copy, but > it won't include the original comments. > > -Dan > > On Thu, Jun 4, 2026 at 3:50 PM Ed Seidl <[email protected]> wrote: > >> On 2026/06/04 22:01:32 Andrew Bell wrote: >> > How can a reader know that it has the tooling to read a file with this >> > approach? >> >> At present there isn't an in-use mechanism beyond parsing the >> "created_by" string. >> >> > What is the hesitation to change version numbers? >> >> Which version number? The version number in the FileMetaData would sort >> of work, >> except in the case of an incompatible change made to the metadata. We >> could change >> the file magic from PAR1 to something else, but that is not workable >> beyond PAR9, say. >> Also, the file magic really shouldn't change frequently as that breaks >> tools like the unix >> "file" command. >> >> One thought I had, that should not break any current readers, would be to >> expand the header >> from 4 to 8 bytes say. We could embed a version number in bytes 4-7. >> Writing a decimal >> 2026 perhaps (if we use calendar year only), or 202606. Or use SemVer, >> one byte each for >> major/minor/patch. Or make the header longer and embed a fixed-length, >> space or null >> padded string. This expanded header shouldn't break current readers since >> the offset for >> the first page should be obtained from the ColumnMetaData. If there are >> readers that rely >> on a page starting immediately after the 'PAR1', we could mandate that >> the first byte >> following PAR1 is 0. A thrift parser would see that as the end of the >> PageHeader struct >> and then likely fail on missing required fields. >> >> Ed >> >>
