Re: [csv] validation of duplicate headers (was [VOTE] Release Apache Commons CSV 1.10.0 based on RC1)

Alex Herbert Tue, 25 Oct 2022 14:30:43 -0700

On Tue, 25 Oct 2022 at 17:29, Alex Herbert <[email protected]> wrote:
>
> On Sun, 23 Oct 2022 at 19:44, Alex Herbert <[email protected]> wrote:
> >
> > Summary:
> >
> > 1. Should CSVParser treat null and blank headers as the same when
> > checking for duplicates, i.e. all are considered an 'empty' name? This
> > is current CSVFormat behaviour.
> > 2. Should CSVFormat respect ignoreHeaderCase when checking for
> > duplicates? This is current CSVParser behaviour.
> > 3. Should blank column names be sanitised to the empty string ""? This
> > is not current behaviour but is the logical behaviour for checking
> > duplicates in CSVFormat.
>
> I have proposed a fix for this in PR #279 [1]. It maintains a flag
> that notes when any type of missing header name has occurred, Thus it
> now throws when a duplicate null is found when using
> DuplicateHeaderMode.DISALLOW.
>
> I marked the PR as a WIP. It should probably have an associated Jira
> ticket to track this change if merged. Or it could be added to CSV-264
> as further details of that fix [2].
>
> I have not updated the documentation for ignoreHeaderCase to address item 2.
>
> The functionality with regard to the header map is unchanged since the
> header map does not store null headers, and any missing headers are
> not modified (i.e. they are not all sanitised to the empty string "").
>
> Alex
>
> [1] https://github.com/apache/commons-csv/pull/279
> [2] https://issues.apache.org/jira/browse/CSV-264


PR now updated with:

- Documentation of the parser specific flag 'ignore header case'
- CSVDuplicateHeaderTest to have test cases using the case insensitive
duplicates

I believe this to be all that is required to fix the issues with
handling duplicate header names.

Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [csv] validation of duplicate headers (was [VOTE] Release Apache Commons CSV 1.10.0 based on RC1)

Reply via email to