[ 
https://issues.apache.org/jira/browse/DRILL-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-5662:
----------------------------------
    Component/s: Storage - Text & CSV

> Compliant text reader (CSV) opens, closes, reopens file with headers
> --------------------------------------------------------------------
>
>                 Key: DRILL-5662
>                 URL: https://issues.apache.org/jira/browse/DRILL-5662
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: Future
>
>
> The "compliant" (CSV) reader can optional read headers from a file. To do so, 
> the reader:
> * Opens the input stream
> * Reads headers
> * Closes the input stream
> * Opens the input stream
> * Reads data (skipping headers)
> * Closes the input stream
> While the above certainly works, it has an unnecessary close/open cycle. Many 
> CSV readers simply read the header and use the same stream to read data. 
> Drill should do so also.
> In fact, Drill has historically coded its own headers scanner. The first was 
> badly broken, but DRILL-5498 improved the parsing (though not file handling.)
> Given that Drill's "compliant" text reader is based on the UniVocity library, 
> and that library can parse headers, we should probably just reuse that 
> existing code which has, very likely, evolved to handle the header usages 
> seen in the wild.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to