Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1030
  
    For FWIW, the native CSV reader does the following:
    
    * To read the header, it seeks to offset 0 in the file, regardless of the 
block being read, then reads the header, which may be a remote read.
    * The reader then seeks to the start of its block. If this is the first 
block, it skips the header, else it searches for the start of the next record.
    
    Since Hive has the same challenges, Hive must have solved this, we have 
only to research that existing solution.
    
    One simple solution is:
    
    * If block number is 0, skip the header.
    * If block number is 1 or larger, look for the next record separator.


---

Reply via email to