jorgecarleitao commented on pull request #9600: URL: https://github.com/apache/arrow/pull/9600#issuecomment-798374010
How an SQL statement is converted to a logical plan includes two aspects: 1. what is the logical plan 2. what is the output schema IMO the principle of least surprise is that the column name becomes what the user wrote on the SQL. A schema is typically part of the contract between the consumer and producer, and there are non-trivial implications of supporting two namings as a configuration: the resulting schema of a query changes depending on the configuration of the engine. This means that if a user writes the statement `SELECT SQRT(c1) FROM T` and runs on a system configured as case-insensitive, the resulting schema is `sqrt(c1)`, but if the user runs it on a case-sensitive system, the resulting schema is `SQRT(c1)`. The above is the primary reason why I did not introduce this idea before; it will lead to either no one changing that parameter once the system is running, or a constant fight in trying to have the systems always configured in the same way to avoid schema changes derived from a change in the configuration. With that said, as long as we are consistent and document the tradeoffs somewhere, I do not have a strong option in either direction (i.e. allow case-sensitive or only support one). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org