[jira] [Commented] (CSV-264) Duplicate empty header names are allowed even with `.withAllowDuplicateHeaderNames(false)`

Syed Shah (Jira) Fri, 02 Oct 2020 11:18:43 -0700


    [ 
https://issues.apache.org/jira/browse/CSV-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206389#comment-17206389
 ]


Syed Shah commented on CSV-264:
-------------------------------

It looks to me like this is intentional. 

https://github.com/apache/commons-csv/blob/master/src/main/java/org/apache/commons/csv/CSVParser.java#L506

{code:java}
// Note: This will always allow a duplicate header if the header is empty
{code}

A possible reason for this could be the following:

||A||B|| || ||C||D||
|1|2| | |3|4|

I don't think it would be user-friendly to make this a failure, but it has 
duplicate column names if we consider the empty headers which are left for 
formatting. Granted I do think it'd be silly essentially store a bunch of empty 
cells as well.
 
---

I can think of a couple of ways this issue could be solved:
# Simply count empty header names towards duplicates, but documents like above 
that use gaps for formatting purposes will have to enable 
withAllowDuplicateHeaderNames, I don't think this is ideal.
# Check the entire column when parsing the document, and if it contains values 
in a non-header row, then it counts towards the duplicate. This would avoid 
empty headers because for formatting, but this sounds like it could get heavy, 
though.
# Create a new option which allows empty duplicates, but I don't think this 
option makes much sense as it's too similar and conflicts with 
withAllowDuplicateHeaderNames.
# Change the withAllowDuplicateHeaderNames boolean to a withDuplicateHeaderRule 
enum with the values "ALLOW_ALL_DUPLICATES", "ALLOW_EMPTY_DUPLICATES", 
"DISALLOW_DUPLICATES".

I was going to PR #1, but I don't think it's a good solution.

If a maintainer could say what they think of this, I'd gladly try to submit the 
patch for it. I'm just unsure on the ideal solution, or if this is considered a 
non-issue. 




> Duplicate empty header names are allowed even with 
> `.withAllowDuplicateHeaderNames(false)`
> ------------------------------------------------------------------------------------------
>
>                 Key: CSV-264
>                 URL: https://issues.apache.org/jira/browse/CSV-264
>             Project: Commons CSV
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: 1.8
>            Reporter: Sagar Tiwari
>            Priority: Major
>
> I'm trying to parse to parse a csv like this:
>  
> {{CSVFormat.DEFAULT}}
> {{ .withHeader()}}
> {{ .withAllowDuplicateHeaderNames(false)}}
> {{ .withAllowMissingColumnNames()}}
> {{ .parse(InputStreamReader(FileInputStream(fl)))}}
>  
> One would expect this code to throw an error if the following csv is given as 
> input:
>  
>  
> {{"","a",""}}
> {{"1","X","3"}}
> {{"3","Y","4"}}
>  
> But it doesn't, and asking for `record.get("")` gives the value from the 
> second column. The first column is ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CSV-264) Duplicate empty header names are allowed even with `.withAllowDuplicateHeaderNames(false)`

Reply via email to