houqp commented on pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-826380176
@jorgecarleitao your understanding on column changes in logical and physical planes are correct. I would add that in physical plane, the string column name is probably not needed. We are currently only using the index field for evaluation. I kept it mostly for debugging purpose. But given the column name info is also available in physical schema fields, I think it should be safe to only store index in physical column expressions. The answer to your schema change question is a little bit tricky, let me try to clarify the new behavior. In short, it changes the field names in logical plan schemas because we require all columns to be normalized when building the plan. For physical schemas, there should be no change for column names except when columns are wrapped with operators. Use your SQL as an example: ```sql SELECT a FROM t1 ``` The logical schema field will be normalized to `t1.a`. However, the final execution output will have a physical/arrow schema with field `a`. The qualifier is stripped during physical planning at: https://github.com/houqp/arrow-datafusion/blob/8ecc215bb7fe44d8cf9dcb4b90df753f0c50afb7/datafusion/src/physical_plan/planner.rs#L483-L486 For DataFrame API, the behavior is the same since both SQL and Dataframe go through the same query builder interface: ```rust df = ctx.table("temp")?; df.select("a").collect().schema().fields()[0].name() ``` The above code will result in `a`. So far this is the same as what datafusion does today. The difference comes in when operators are used, for example: ```sql SELECT a, MAX(b) FROM t1 ``` This will result in two unqualified fields `a` and `MAX(t1.b)`. Basically I made sure the behavior is consistent with MySQL, Postgresql and Spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org