On a side note, I put a patch in to the Linux 'file' command that makes it
recognize ORC files. If you've got file 5.31 or later, you'll get:

owen@laptop> file examples/*.orc
examples/TestOrcFile.columnProjection.orc:              Apache ORC
examples/TestOrcFile.emptyFile.orc:                     Apache ORC
examples/TestOrcFile.metaData.orc:                      Apache ORC
examples/TestOrcFile.test1.orc:                         Apache ORC
examples/TestOrcFile.testDate1900.orc:                  Apache ORC
examples/TestOrcFile.testDate2038.orc:                  Apache ORC
examples/TestOrcFile.testMemoryManagementV11.orc:       Apache ORC
examples/TestOrcFile.testMemoryManagementV12.orc:       Apache ORC
examples/TestOrcFile.testPredicatePushdown.orc:         Apache ORC
examples/TestOrcFile.testSeek.orc:                      Apache ORC
examples/TestOrcFile.testSnappy.orc:                    Apache ORC
examples/TestOrcFile.testStringAndBinaryStatistics.orc: Apache ORC
examples/TestOrcFile.testStripeLevelStats.orc:          Apache ORC
examples/TestOrcFile.testTimestamp.orc:                 Apache ORC
examples/TestOrcFile.testUnionAndTimestamp.orc:         Apache ORC
examples/TestOrcFile.testWithoutIndex.orc:              Apache ORC
examples/TestVectorOrcFile.testLz4.orc:                 Apache ORC
examples/TestVectorOrcFile.testLzo.orc:                 Apache ORC
examples/decimal.orc:                                   Apache ORC
examples/demo-11-none.orc:                              Apache ORC
examples/demo-11-zlib.orc:                              Apache ORC
examples/demo-12-zlib.orc:                              Apache ORC
examples/nulls-at-end-snappy.orc:                       Apache ORC
examples/orc-file-11-format.orc:                        Apache ORC
examples/orc_index_int_string.orc:                      Apache ORC
examples/orc_split_elim.orc:                            Apache ORC
examples/orc_split_elim_new.orc:                        Apache ORC
examples/over1k_bloom.orc:                              Apache ORC
examples/version1999.orc:                               Apache ORC
examples/zero.orc:                                      empty

It looks like the file command has finally added negative offsets from the
end of the file, so we could extend it with more information.

.. Owen


On Fri, Dec 15, 2017 at 7:00 AM, Deepak Majeti <[email protected]>
wrote:

> Hi Xiening,
>
> The readers (both java and c++) just use the "magic" bits present in the
> Tail to verify ORC files. But the spec requires "ORC" bits to be present in
> the header as well to support tools that scan from the front.
> You can verify this from the ORC files written by the Java writer.
> I just observed this requirement today as well. We should support this with
> the C++ writer too if we don't already.
>
>
> On Fri, Dec 15, 2017 at 2:45 AM, Dain Sundstrom <[email protected]> wrote:
>
> > Thanks Deepak. I was searching for “magic” and missed this part.
> >
> > -dain
> >
> > > On Dec 14, 2017, at 7:16 PM, Deepak Majeti <[email protected]>
> > wrote:
> > >
> > > Hi Dain,
> > >
> > > The ORC spec requires that a file start with "ORC".
> > >
> > > From https://orc.apache.org/docs/file-tail.html
> > >
> > > "The file is broken in to three parts- Header, Body, and Tail. The
> Header
> > > consists of the bytes “ORC’’ to support tools that want to scan the
> front
> > > of the file to determine the type of the file."
> > >
> > > On Thu, Dec 14, 2017 at 2:00 PM, Dain Sundstrom <[email protected]> wrote:
> > >
> > >> Does the ORC spec require that a file start with “ORC”?
> > >>
> > >> -dain
> > >
> > >
> > >
> > >
> > > --
> > > regards,
> > > Deepak Majeti
> >
> >
>
>
> --
> regards,
> Deepak Majeti
>

Reply via email to