[ 
https://issues.apache.org/jira/browse/ARROW-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-8210:
-----------------------------------------
    Description: 
While testing duplicate column names, I ran into multiple issues:

* Factory fails if there are duplicate columns, even for a single file
* In addition, we should also fix and/or test that factory works for duplicate 
columns if the schema's are equal
* Once a Dataset with duplicated columns is created, scanning without any 
column projection fails

> [C++][Dataset] Handling of duplicate columns in Dataset factory and scanning
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-8210
>                 URL: https://issues.apache.org/jira/browse/ARROW-8210
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, C++ - Dataset
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> While testing duplicate column names, I ran into multiple issues:
> * Factory fails if there are duplicate columns, even for a single file
> * In addition, we should also fix and/or test that factory works for 
> duplicate columns if the schema's are equal
> * Once a Dataset with duplicated columns is created, scanning without any 
> column projection fails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to