jorgecarleitao commented on pull request #9600:
URL: https://github.com/apache/arrow/pull/9600#issuecomment-798374010


   How an SQL statement is converted to a logical plan includes two aspects:
   
   1. what is the logical plan
   2. what is the output schema
   
   IMO the principle of least surprise is that the column name becomes what the 
user wrote on the SQL.
   
   A schema is typically part of the contract between the consumer and 
producer, and there are non-trivial implications of supporting two namings as a 
configuration: the resulting schema of a query changes depending on the 
configuration of the engine. This means that if a user writes the statement 
`SELECT SQRT(c1) FROM T` and runs on a system configured as case-insensitive, 
the resulting schema is `sqrt(c1)`, but if the user runs it on a case-sensitive 
system, the resulting schema is `SQRT(c1)`.
   
   The above is the primary reason why I did not introduce this idea before; it 
will lead to either no one changing that parameter once the system is running, 
or a constant fight in trying to have the systems always configured in the same 
way to avoid schema changes derived from a change in the configuration.
   
   With that said, as long as we are consistent and document the tradeoffs 
somewhere, I do not have a strong option in either direction (i.e. allow 
case-sensitive or only support one).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to