[ https://issues.apache.org/jira/browse/DRILL-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005723#comment-16005723 ]
ASF GitHub Bot commented on DRILL-5498: --------------------------------------- GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/830 DRILL-5498: Improve handling of CSV column headers See DRILL-5498 for details. Replaced the repeated varchar reader for reading columns with a purpose built column parser. Implemented rules to recover from invalid column headers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5498 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/830.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #830 ---- commit c9a3c5e1d0c21e8d6375436a42937a1d3062c8ab Author: Paul Rogers <prog...@maprtech.com> Date: 2017-05-10T23:17:24Z DRILL-5498: Better handling of CSV column headers See DRILL-5498 for details. Replaced the repeated varchar reader for reading columns with a purpose built column parser. Implemented rules to recover from invalid column headers. commit 1446b095acb44c3e1c2acfb0dd4e5572cc21867f Author: Paul Rogers <prog...@maprtech.com> Date: 2017-05-10T23:38:58Z Added missing test method ---- > CSV text reader does not properly handle duplicate header names > --------------------------------------------------------------- > > Key: DRILL-5498 > URL: https://issues.apache.org/jira/browse/DRILL-5498 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Priority: Minor > > Consider the following CSV file: > {code} > h,h,h > a,b,c > d,e,f > {code} > Parse this with the CSV storage plugins to parse headers. The result: > {code} > 2 row(s): > h > c > f > {code} > Expected a runtime error for the duplicate column names, or automatic > "uniqification" of the names. Certainly did not expect the first two columns > to be dropped. -- This message was sent by Atlassian JIRA (v6.3.15#6346)