Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1030 For FWIW, the native CSV reader does the following: * To read the header, it seeks to offset 0 in the file, regardless of the block being read, then reads the header, which may be a remote read. * The reader then seeks to the start of its block. If this is the first block, it skips the header, else it searches for the start of the next record. Since Hive has the same challenges, Hive must have solved this, we have only to research that existing solution. One simple solution is: * If block number is 0, skip the header. * If block number is 1 or larger, look for the next record separator.
---