[jira] [Commented] (CSV-164) Support duplicate header names
[ https://issues.apache.org/jira/browse/CSV-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504609#comment-17504609 ] Gary D. Gregory commented on CSV-164: - We now have a duplicate header mode from [CSV-264] in git master for the forthcoming 1.10.0. See also our Maven snapshot repository https://repository.apache.org/content/repositories/snapshots/. > Support duplicate header names > -- > > Key: CSV-164 > URL: https://issues.apache.org/jira/browse/CSV-164 > Project: Commons CSV > Issue Type: Bug >Affects Versions: 1.2 >Reporter: Romain Manni-Bucau >Priority: Major > > nothing prevents a CSV to have the same time the same header name so > validation at the end of org.apache.commons.csv.CSVFormat#validate should > likely disappear or should support a flag to disable it -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (CSV-164) Support duplicate header names
[ https://issues.apache.org/jira/browse/CSV-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329555#comment-15329555 ] Steffen Zschaler commented on CSV-164: -- Code is available here: https://github.com/szschaler/commons-csv/tree/list_empty_headers Let me know if this is of interest for pushing back into Commons CSV and I will write some tests and prepare a proper pull request. > Support duplicate header names > -- > > Key: CSV-164 > URL: https://issues.apache.org/jira/browse/CSV-164 > Project: Commons CSV > Issue Type: Bug >Affects Versions: 1.2 >Reporter: Romain Manni-Bucau > > nothing prevents a CSV to have the same time the same header name so > validation at the end of org.apache.commons.csv.CSVFormat#validate should > likely disappear or should support a flag to disable it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CSV-164) Support duplicate header names
[ https://issues.apache.org/jira/browse/CSV-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329481#comment-15329481 ] Steffen Zschaler commented on CSV-164: -- I have a similar issue, specifically where the duplicate headers are empty headers. So, for example: {code} A,B,,C,,D 1,2,,3,,4 {code} The empty columns here have been inserted for readability. I need to do some processing over the file, removing some columns and doing some updates to other places, and then write out the modified CSV file again. Ideally, I would keep the empty columns so that readability is maintained. I also need to keep the header names from the original file. Finally, I have no _ad hoc_ information about how many columns there are in total (beyond a number of standard columns at the left of the file), so cannot easily predefine an artificial header either. Currently, Commons CSV cannot handle this because it only keeps track of the last empty column. For this specific use case, I think there is a solution that is non-API breaking by providing additional functionality to get a list of all columns with empty headers if empty headers are allowed (which can be flagged already). Optionally, we could also stop putting empty headers into the header map, but this may break some users. I'm going to have a go at implementing this in a commons-csv fork anyway, as I need it for my current project. Is there an interest in having this contributed back to the main code and if so, should I open a separate issue for it or reference it to this issue? > Support duplicate header names > -- > > Key: CSV-164 > URL: https://issues.apache.org/jira/browse/CSV-164 > Project: Commons CSV > Issue Type: Bug >Affects Versions: 1.2 >Reporter: Romain Manni-Bucau > > nothing prevents a CSV to have the same time the same header name so > validation at the end of org.apache.commons.csv.CSVFormat#validate should > likely disappear or should support a flag to disable it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CSV-164) Support duplicate header names
[ https://issues.apache.org/jira/browse/CSV-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021800#comment-15021800 ] Michael Osipov commented on CSV-164: How is get is supposed to work on a map when the column name is not unique? > Support duplicate header names > -- > > Key: CSV-164 > URL: https://issues.apache.org/jira/browse/CSV-164 > Project: Commons CSV > Issue Type: Bug >Reporter: Romain Manni-Bucau > > nothing prevents a CSV to have the same time the same header name so > validation at the end of org.apache.commons.csv.CSVFormat#validate should > likely disappear or should support a flag to disable it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CSV-164) Support duplicate header names
[ https://issues.apache.org/jira/browse/CSV-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021924#comment-15021924 ] Romain Manni-Bucau commented on CSV-164: This is what I proposed early: if you do get("duplicatedColumnName") then you should fail but get("uniqueColumnName") should still work. Doesnt remove the access (parse) feature since you can still access it by index. Typically in batchee mapping we have a thin layer on top of [csv] where you can map CSV on an object either by index or name. If you use name then you define header names but index access is priviledged over name access which makes this case pretty smooth: in https://github.com/apache/incubator-batchee/blob/master/extensions/commons-csv/src/main/java/org/apache/batchee/csv/mapper/DefaultMapper.java#L89 the values of fieldByPosition and fieldByName are unique (ie fieldByName doesnt have any duplicate with fieldByPosition) to guaratee all of this to work. The access by header name is a nice API most of the time but not the way CSV really works (column index by definition) so I like this API but it has some limits we hit with this issue. > Support duplicate header names > -- > > Key: CSV-164 > URL: https://issues.apache.org/jira/browse/CSV-164 > Project: Commons CSV > Issue Type: Bug >Reporter: Romain Manni-Bucau > > nothing prevents a CSV to have the same time the same header name so > validation at the end of org.apache.commons.csv.CSVFormat#validate should > likely disappear or should support a flag to disable it -- This message was sent by Atlassian JIRA (v6.3.4#6332)