[GitHub] drill issue #1030: DRILL-5941: Skip header / footer improvements for Hive st...

arina-ielchiieva Tue, 14 Nov 2017 09:38:57 -0800

Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/1030
  
    @ppadma 
    To create reader for each input split and maintain skip header / footer 
functionality we need to know how many rows are in input split. Unfortunately, 
input split does not hold such information, only number of bytes. [1] We can't 
apply skip header functionality for the first input split and skip footer for 
the last input either since we don't know how many rows will be skipped, it can 
be the situation that we need to skip the whole first input split and partially 
second.
    
    @paul-rogers 
    To read from hive we actually use Hadoop reader [2, 3] so if I am not 
mistaken unfortunately the described above approach can be applied.
    
    [1] 
https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapred/FileSplit.html
    [2] 
https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveAbstractReader.java#L234
    [3] 
https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapred/RecordReader.html

---

[GitHub] drill issue #1030: DRILL-5941: Skip header / footer improvements for Hive st...

Reply via email to