Re: [csv] validation of duplicate headers (was [VOTE] Release Apache Commons CSV 1.10.0 based on RC1)

2022-10-25 Thread Alex Herbert
On Tue, 25 Oct 2022 at 17:29, Alex Herbert  wrote:
>
> On Sun, 23 Oct 2022 at 19:44, Alex Herbert  wrote:
> >
> > Summary:
> >
> > 1. Should CSVParser treat null and blank headers as the same when
> > checking for duplicates, i.e. all are considered an 'empty' name? This
> > is current CSVFormat behaviour.
> > 2. Should CSVFormat respect ignoreHeaderCase when checking for
> > duplicates? This is current CSVParser behaviour.
> > 3. Should blank column names be sanitised to the empty string ""? This
> > is not current behaviour but is the logical behaviour for checking
> > duplicates in CSVFormat.
>
> I have proposed a fix for this in PR #279 [1]. It maintains a flag
> that notes when any type of missing header name has occurred, Thus it
> now throws when a duplicate null is found when using
> DuplicateHeaderMode.DISALLOW.
>
> I marked the PR as a WIP. It should probably have an associated Jira
> ticket to track this change if merged. Or it could be added to CSV-264
> as further details of that fix [2].
>
> I have not updated the documentation for ignoreHeaderCase to address item 2.
>
> The functionality with regard to the header map is unchanged since the
> header map does not store null headers, and any missing headers are
> not modified (i.e. they are not all sanitised to the empty string "").
>
> Alex
>
> [1] https://github.com/apache/commons-csv/pull/279
> [2] https://issues.apache.org/jira/browse/CSV-264

PR now updated with:

- Documentation of the parser specific flag 'ignore header case'
- CSVDuplicateHeaderTest to have test cases using the case insensitive
duplicates

I believe this to be all that is required to fix the issues with
handling duplicate header names.

Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [csv] validation of duplicate headers (was [VOTE] Release Apache Commons CSV 1.10.0 based on RC1)

2022-10-25 Thread Alex Herbert
On Sun, 23 Oct 2022 at 19:44, Alex Herbert  wrote:
>
> Summary:
>
> 1. Should CSVParser treat null and blank headers as the same when
> checking for duplicates, i.e. all are considered an 'empty' name? This
> is current CSVFormat behaviour.
> 2. Should CSVFormat respect ignoreHeaderCase when checking for
> duplicates? This is current CSVParser behaviour.
> 3. Should blank column names be sanitised to the empty string ""? This
> is not current behaviour but is the logical behaviour for checking
> duplicates in CSVFormat.

I have proposed a fix for this in PR #279 [1]. It maintains a flag
that notes when any type of missing header name has occurred, Thus it
now throws when a duplicate null is found when using
DuplicateHeaderMode.DISALLOW.

I marked the PR as a WIP. It should probably have an associated Jira
ticket to track this change if merged. Or it could be added to CSV-264
as further details of that fix [2].

I have not updated the documentation for ignoreHeaderCase to address item 2.

The functionality with regard to the header map is unchanged since the
header map does not store null headers, and any missing headers are
not modified (i.e. they are not all sanitised to the empty string "").

Alex

[1] https://github.com/apache/commons-csv/pull/279
[2] https://issues.apache.org/jira/browse/CSV-264

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org