Re: [CSV] Strategies to handle duplicate headers

sebb Wed, 21 Jun 2023 10:00:51 -0700

On Tue, 20 Jun 2023 at 12:39, Gary Gregory <garydgreg...@gmail.com> wrote:
>
> Hi All,
>
> This thread is a follow-up to
> https://github.com/apache/commons-csv/pull/309#issuecomment-1441456258
>
> Bruno says:
> "With Pandas it automatically deduplicates the column names. Maybe
> that's a feature that we could have in Commons CSV too?"
>
> What does that mean and actually do? Say I have column A with row 1
> value of "X" and 2nd column A with row 1 value of 2. What do I get
> when I ask for column A row 1?
>
> Seth says:
> "HeaderStrategy Interface
> Contains two functions:
>
> #normalizeHeaders(headings) - With given heading, output a list that
> fits with whatever the strategy is going for.
> #get(record, header) - Fetch value(s) based on given column name."
>
> I would see perhaps two interfaces so that lambdas might be used more
> simply. Maybe, needs an example.
>
> "I'm also wary that this may screw up existing projects that depend on
> allowing/disallowing duplicates. i.e. want to allow duplicates and
> handle things through indexes / iteration, so this didn't cause a
> problem for them and want to preserve header names, and so don't need
> the headers deduplicated."
>
> As a first cut whatever we do could/should maintain the existing
> behavior. We can change the default later by popular demand.


That will be a breaking change, so I would be against that.

> Gary
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [CSV] Strategies to handle duplicate headers

Reply via email to