alamb commented on code in PR #12466:
URL: https://github.com/apache/datafusion/pull/12466#discussion_r1761626074
##########
datafusion/sql/src/statement.rs:
##########
@@ -1028,8 +1030,26 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
.into_iter()
.collect();
- let schema = self.build_schema(columns)?;
- let df_schema = schema.to_dfschema_ref()?;
+ let df_schema = match file_type.as_str() {
Review Comment:
I am sorry for the delayed feeback @devanbenz -- I swear I typed this
feedback but i must not have clicked "submit"
Basically my concerns about this approach are twofold:
1. This code assumes the parquet file is on the local filesystem (when for
many systems it may be on remote object storage)
2. It also adds a dependency in sql parsing to the parquet format. Since
`parquet` has quite a few dependencies, this new dependency is likely non ideal
for systems that are using DataFusion for sql parsing (like dask-sql for
example)
Perhaps you could delay the creation of the ORDER BY until the table
provider is resolved?
The table provider:
https://github.com/apache/datafusion/blob/2521043ddcb3895a2010b8e328f3fa10f77fc094/datafusion/expr/src/planner.rs#L35-L34
Once the table provider is resolved then the schema's table can be known
Another benefit of this approach is that it would work for all formats, not
just parquet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]