On a side note, I put a patch in to the Linux 'file' command that makes it recognize ORC files. If you've got file 5.31 or later, you'll get:
owen@laptop> file examples/*.orc examples/TestOrcFile.columnProjection.orc: Apache ORC examples/TestOrcFile.emptyFile.orc: Apache ORC examples/TestOrcFile.metaData.orc: Apache ORC examples/TestOrcFile.test1.orc: Apache ORC examples/TestOrcFile.testDate1900.orc: Apache ORC examples/TestOrcFile.testDate2038.orc: Apache ORC examples/TestOrcFile.testMemoryManagementV11.orc: Apache ORC examples/TestOrcFile.testMemoryManagementV12.orc: Apache ORC examples/TestOrcFile.testPredicatePushdown.orc: Apache ORC examples/TestOrcFile.testSeek.orc: Apache ORC examples/TestOrcFile.testSnappy.orc: Apache ORC examples/TestOrcFile.testStringAndBinaryStatistics.orc: Apache ORC examples/TestOrcFile.testStripeLevelStats.orc: Apache ORC examples/TestOrcFile.testTimestamp.orc: Apache ORC examples/TestOrcFile.testUnionAndTimestamp.orc: Apache ORC examples/TestOrcFile.testWithoutIndex.orc: Apache ORC examples/TestVectorOrcFile.testLz4.orc: Apache ORC examples/TestVectorOrcFile.testLzo.orc: Apache ORC examples/decimal.orc: Apache ORC examples/demo-11-none.orc: Apache ORC examples/demo-11-zlib.orc: Apache ORC examples/demo-12-zlib.orc: Apache ORC examples/nulls-at-end-snappy.orc: Apache ORC examples/orc-file-11-format.orc: Apache ORC examples/orc_index_int_string.orc: Apache ORC examples/orc_split_elim.orc: Apache ORC examples/orc_split_elim_new.orc: Apache ORC examples/over1k_bloom.orc: Apache ORC examples/version1999.orc: Apache ORC examples/zero.orc: empty It looks like the file command has finally added negative offsets from the end of the file, so we could extend it with more information. .. Owen On Fri, Dec 15, 2017 at 7:00 AM, Deepak Majeti <[email protected]> wrote: > Hi Xiening, > > The readers (both java and c++) just use the "magic" bits present in the > Tail to verify ORC files. But the spec requires "ORC" bits to be present in > the header as well to support tools that scan from the front. > You can verify this from the ORC files written by the Java writer. > I just observed this requirement today as well. We should support this with > the C++ writer too if we don't already. > > > On Fri, Dec 15, 2017 at 2:45 AM, Dain Sundstrom <[email protected]> wrote: > > > Thanks Deepak. I was searching for “magic” and missed this part. > > > > -dain > > > > > On Dec 14, 2017, at 7:16 PM, Deepak Majeti <[email protected]> > > wrote: > > > > > > Hi Dain, > > > > > > The ORC spec requires that a file start with "ORC". > > > > > > From https://orc.apache.org/docs/file-tail.html > > > > > > "The file is broken in to three parts- Header, Body, and Tail. The > Header > > > consists of the bytes “ORC’’ to support tools that want to scan the > front > > > of the file to determine the type of the file." > > > > > > On Thu, Dec 14, 2017 at 2:00 PM, Dain Sundstrom <[email protected]> wrote: > > > > > >> Does the ORC spec require that a file start with “ORC”? > > >> > > >> -dain > > > > > > > > > > > > > > > -- > > > regards, > > > Deepak Majeti > > > > > > > -- > regards, > Deepak Majeti >
