Hi Deepak,

ORC C++ writer does write “ORC” magic at the beginning of file. But the reader 
is not verify it when open the file (same for Java reader as far as I can 
tell). But there’s probably a reason for that - since the reader already 
verifies the postscript at file tail it’s not necessary to check the header 
again which will require an additional IO.


> On Dec 15, 2017, at 8:55 AM, Owen O'Malley <[email protected]> wrote:
> 
> On a side note, I put a patch in to the Linux 'file' command that makes it
> recognize ORC files. If you've got file 5.31 or later, you'll get:
> 
> owen@laptop> file examples/*.orc
> examples/TestOrcFile.columnProjection.orc:              Apache ORC
> examples/TestOrcFile.emptyFile.orc:                     Apache ORC
> examples/TestOrcFile.metaData.orc:                      Apache ORC
> examples/TestOrcFile.test1.orc:                         Apache ORC
> examples/TestOrcFile.testDate1900.orc:                  Apache ORC
> examples/TestOrcFile.testDate2038.orc:                  Apache ORC
> examples/TestOrcFile.testMemoryManagementV11.orc:       Apache ORC
> examples/TestOrcFile.testMemoryManagementV12.orc:       Apache ORC
> examples/TestOrcFile.testPredicatePushdown.orc:         Apache ORC
> examples/TestOrcFile.testSeek.orc:                      Apache ORC
> examples/TestOrcFile.testSnappy.orc:                    Apache ORC
> examples/TestOrcFile.testStringAndBinaryStatistics.orc: Apache ORC
> examples/TestOrcFile.testStripeLevelStats.orc:          Apache ORC
> examples/TestOrcFile.testTimestamp.orc:                 Apache ORC
> examples/TestOrcFile.testUnionAndTimestamp.orc:         Apache ORC
> examples/TestOrcFile.testWithoutIndex.orc:              Apache ORC
> examples/TestVectorOrcFile.testLz4.orc:                 Apache ORC
> examples/TestVectorOrcFile.testLzo.orc:                 Apache ORC
> examples/decimal.orc:                                   Apache ORC
> examples/demo-11-none.orc:                              Apache ORC
> examples/demo-11-zlib.orc:                              Apache ORC
> examples/demo-12-zlib.orc:                              Apache ORC
> examples/nulls-at-end-snappy.orc:                       Apache ORC
> examples/orc-file-11-format.orc:                        Apache ORC
> examples/orc_index_int_string.orc:                      Apache ORC
> examples/orc_split_elim.orc:                            Apache ORC
> examples/orc_split_elim_new.orc:                        Apache ORC
> examples/over1k_bloom.orc:                              Apache ORC
> examples/version1999.orc:                               Apache ORC
> examples/zero.orc:                                      empty
> 
> It looks like the file command has finally added negative offsets from the
> end of the file, so we could extend it with more information.
> 
> .. Owen
> 
> 
> On Fri, Dec 15, 2017 at 7:00 AM, Deepak Majeti <[email protected]>
> wrote:
> 
>> Hi Xiening,
>> 
>> The readers (both java and c++) just use the "magic" bits present in the
>> Tail to verify ORC files. But the spec requires "ORC" bits to be present in
>> the header as well to support tools that scan from the front.
>> You can verify this from the ORC files written by the Java writer.
>> I just observed this requirement today as well. We should support this with
>> the C++ writer too if we don't already.
>> 
>> 
>> On Fri, Dec 15, 2017 at 2:45 AM, Dain Sundstrom <[email protected]> wrote:
>> 
>>> Thanks Deepak. I was searching for “magic” and missed this part.
>>> 
>>> -dain
>>> 
>>>> On Dec 14, 2017, at 7:16 PM, Deepak Majeti <[email protected]>
>>> wrote:
>>>> 
>>>> Hi Dain,
>>>> 
>>>> The ORC spec requires that a file start with "ORC".
>>>> 
>>>> From https://orc.apache.org/docs/file-tail.html
>>>> 
>>>> "The file is broken in to three parts- Header, Body, and Tail. The
>> Header
>>>> consists of the bytes “ORC’’ to support tools that want to scan the
>> front
>>>> of the file to determine the type of the file."
>>>> 
>>>> On Thu, Dec 14, 2017 at 2:00 PM, Dain Sundstrom <[email protected]> wrote:
>>>> 
>>>>> Does the ORC spec require that a file start with “ORC”?
>>>>> 
>>>>> -dain
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> regards,
>>>> Deepak Majeti
>>> 
>>> 
>> 
>> 
>> --
>> regards,
>> Deepak Majeti
>> 

Reply via email to