[ https://issues.apache.org/jira/browse/CSV-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206389#comment-17206389 ]
Syed Shah commented on CSV-264: ------------------------------- It looks to me like this is intentional. https://github.com/apache/commons-csv/blob/master/src/main/java/org/apache/commons/csv/CSVParser.java#L506 {code:java} // Note: This will always allow a duplicate header if the header is empty {code} A possible reason for this could be the following: ||A||B|| || ||C||D|| |1|2| | |3|4| I don't think it would be user-friendly to make this a failure, but it has duplicate column names if we consider the empty headers which are left for formatting. Granted I do think it'd be silly essentially store a bunch of empty cells as well. --- I can think of a couple of ways this issue could be solved: # Simply count empty header names towards duplicates, but documents like above that use gaps for formatting purposes will have to enable withAllowDuplicateHeaderNames, I don't think this is ideal. # Check the entire column when parsing the document, and if it contains values in a non-header row, then it counts towards the duplicate. This would avoid empty headers because for formatting, but this sounds like it could get heavy, though. # Create a new option which allows empty duplicates, but I don't think this option makes much sense as it's too similar and conflicts with withAllowDuplicateHeaderNames. # Change the withAllowDuplicateHeaderNames boolean to a withDuplicateHeaderRule enum with the values "ALLOW_ALL_DUPLICATES", "ALLOW_EMPTY_DUPLICATES", "DISALLOW_DUPLICATES". I was going to PR #1, but I don't think it's a good solution. If a maintainer could say what they think of this, I'd gladly try to submit the patch for it. I'm just unsure on the ideal solution, or if this is considered a non-issue. > Duplicate empty header names are allowed even with > `.withAllowDuplicateHeaderNames(false)` > ------------------------------------------------------------------------------------------ > > Key: CSV-264 > URL: https://issues.apache.org/jira/browse/CSV-264 > Project: Commons CSV > Issue Type: Bug > Components: Parser > Affects Versions: 1.8 > Reporter: Sagar Tiwari > Priority: Major > > I'm trying to parse to parse a csv like this: > > {{CSVFormat.DEFAULT}} > {{ .withHeader()}} > {{ .withAllowDuplicateHeaderNames(false)}} > {{ .withAllowMissingColumnNames()}} > {{ .parse(InputStreamReader(FileInputStream(fl)))}} > > One would expect this code to throw an error if the following csv is given as > input: > > > {{"","a",""}} > {{"1","X","3"}} > {{"3","Y","4"}} > > But it doesn't, and asking for `record.get("")` gives the value from the > second column. The first column is ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005)