[GitHub] drill issue #1030: DRILL-5941: Skip header / footer improvements for Hive st...

paul-rogers Mon, 13 Nov 2017 11:47:40 -0800

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1030
  
    For FWIW, the native CSV reader does the following:
    
    * To read the header, it seeks to offset 0 in the file, regardless of the 
block being read, then reads the header, which may be a remote read.
    * The reader then seeks to the start of its block. If this is the first 
block, it skips the header, else it searches for the start of the next record.
    
    Since Hive has the same challenges, Hive must have solved this, we have 
only to research that existing solution.
    
    One simple solution is:
    
    * If block number is 0, skip the header.
    * If block number is 1 or larger, look for the next record separator.

---

[GitHub] drill issue #1030: DRILL-5941: Skip header / footer improvements for Hive st...

Reply via email to