[ https://issues.apache.org/jira/browse/DRILL-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005476#comment-16005476 ]
Paul Rogers commented on DRILL-5498: ------------------------------------ Proposed solution: * Avoid failing queries unless absolutely necessary. Try to "fix up" headers where possible. * Ignore leading and trailing whitespace in a header * If the header has no headers at all, fail the query. * If any header is empty, make up a header of the form "column_x" where x is the column position. * If any header contains invalid SQL symbol characters, replace the character with "_". * If the first character of a header is invalid, replace that character with "col_" (since underscore is not valid as the first character.) * If header j duplicates header i, i < j, append "_x" to header j, where x is 2, 3, 4, ... until a unique name is found. * Allow Unicode characters in headers. For example: {code} Headers: a, b, c Column names: a, b, c Headers: (none) Produce an error Headers: , , (blank headers) Column names: column_1, column_2, column_3 Headers: _a, 99, h! Column names: col_a, col_99, h_ Headers: a, a, a Column names: a, a_2, a_3 {code} Headers that worked in the prior version continue to work. Headers that failed in the prior version now work. > CSV text reader does not properly handle duplicate header names > --------------------------------------------------------------- > > Key: DRILL-5498 > URL: https://issues.apache.org/jira/browse/DRILL-5498 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Priority: Minor > > Consider the following CSV file: > {code} > h,h,h > a,b,c > d,e,f > {code} > Parse this with the CSV storage plugins to parse headers. The result: > {code} > 2 row(s): > h > c > f > {code} > Expected a runtime error for the duplicate column names, or automatic > "uniqification" of the names. Certainly did not expect the first two columns > to be dropped. -- This message was sent by Atlassian JIRA (v6.3.15#6346)