Hi Deepak, ORC C++ writer does write “ORC” magic at the beginning of file. But the reader is not verify it when open the file (same for Java reader as far as I can tell). But there’s probably a reason for that - since the reader already verifies the postscript at file tail it’s not necessary to check the header again which will require an additional IO.
> On Dec 15, 2017, at 8:55 AM, Owen O'Malley <[email protected]> wrote: > > On a side note, I put a patch in to the Linux 'file' command that makes it > recognize ORC files. If you've got file 5.31 or later, you'll get: > > owen@laptop> file examples/*.orc > examples/TestOrcFile.columnProjection.orc: Apache ORC > examples/TestOrcFile.emptyFile.orc: Apache ORC > examples/TestOrcFile.metaData.orc: Apache ORC > examples/TestOrcFile.test1.orc: Apache ORC > examples/TestOrcFile.testDate1900.orc: Apache ORC > examples/TestOrcFile.testDate2038.orc: Apache ORC > examples/TestOrcFile.testMemoryManagementV11.orc: Apache ORC > examples/TestOrcFile.testMemoryManagementV12.orc: Apache ORC > examples/TestOrcFile.testPredicatePushdown.orc: Apache ORC > examples/TestOrcFile.testSeek.orc: Apache ORC > examples/TestOrcFile.testSnappy.orc: Apache ORC > examples/TestOrcFile.testStringAndBinaryStatistics.orc: Apache ORC > examples/TestOrcFile.testStripeLevelStats.orc: Apache ORC > examples/TestOrcFile.testTimestamp.orc: Apache ORC > examples/TestOrcFile.testUnionAndTimestamp.orc: Apache ORC > examples/TestOrcFile.testWithoutIndex.orc: Apache ORC > examples/TestVectorOrcFile.testLz4.orc: Apache ORC > examples/TestVectorOrcFile.testLzo.orc: Apache ORC > examples/decimal.orc: Apache ORC > examples/demo-11-none.orc: Apache ORC > examples/demo-11-zlib.orc: Apache ORC > examples/demo-12-zlib.orc: Apache ORC > examples/nulls-at-end-snappy.orc: Apache ORC > examples/orc-file-11-format.orc: Apache ORC > examples/orc_index_int_string.orc: Apache ORC > examples/orc_split_elim.orc: Apache ORC > examples/orc_split_elim_new.orc: Apache ORC > examples/over1k_bloom.orc: Apache ORC > examples/version1999.orc: Apache ORC > examples/zero.orc: empty > > It looks like the file command has finally added negative offsets from the > end of the file, so we could extend it with more information. > > .. Owen > > > On Fri, Dec 15, 2017 at 7:00 AM, Deepak Majeti <[email protected]> > wrote: > >> Hi Xiening, >> >> The readers (both java and c++) just use the "magic" bits present in the >> Tail to verify ORC files. But the spec requires "ORC" bits to be present in >> the header as well to support tools that scan from the front. >> You can verify this from the ORC files written by the Java writer. >> I just observed this requirement today as well. We should support this with >> the C++ writer too if we don't already. >> >> >> On Fri, Dec 15, 2017 at 2:45 AM, Dain Sundstrom <[email protected]> wrote: >> >>> Thanks Deepak. I was searching for “magic” and missed this part. >>> >>> -dain >>> >>>> On Dec 14, 2017, at 7:16 PM, Deepak Majeti <[email protected]> >>> wrote: >>>> >>>> Hi Dain, >>>> >>>> The ORC spec requires that a file start with "ORC". >>>> >>>> From https://orc.apache.org/docs/file-tail.html >>>> >>>> "The file is broken in to three parts- Header, Body, and Tail. The >> Header >>>> consists of the bytes “ORC’’ to support tools that want to scan the >> front >>>> of the file to determine the type of the file." >>>> >>>> On Thu, Dec 14, 2017 at 2:00 PM, Dain Sundstrom <[email protected]> wrote: >>>> >>>>> Does the ORC spec require that a file start with “ORC”? >>>>> >>>>> -dain >>>> >>>> >>>> >>>> >>>> -- >>>> regards, >>>> Deepak Majeti >>> >>> >> >> >> -- >> regards, >> Deepak Majeti >>
