[ 
https://issues.apache.org/jira/browse/ARROW-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157378#comment-17157378
 ] 

Andy Grove commented on ARROW-8205:
-----------------------------------

I think I am fine with removing Expr::Column(usize) from the logical plan and 
removing the complexity there. I hope I'm not forgetting any important reason 
for still having it there.

I also think that projections in the physical plan should generally be based on 
name so that we can better handle cases where parquet partitions don't all have 
the same schema, and also to handle schemaless use cases better e.g. querying 
JSON.

However, the output from the parquet scan should generally have a known schema 
(based on the projection) so it would be nice for all the upstream operators to 
be able to operate on indices rather than names for good performance. I would 
be interested to see any impact on performance from your changes, perhaps by 
running the benchmark crate in the project.

 

> [Rust] [DataFusion] DataFusion should enforce unique field names in a schema
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-8205
>                 URL: https://issues.apache.org/jira/browse/ARROW-8205
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust, Rust - DataFusion
>    Affects Versions: 0.16.0
>            Reporter: Andy Grove
>            Priority: Major
>             Fix For: 2.0.0
>
>
> There does not seem to be any validation to avoid schemas being created with 
> duplicate field names. We should add this along with unit tests.
> This will require changing the signature of the constructors to try_new with 
> a Result return type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to