melgenek commented on issue #6349:
URL: 
https://github.com/apache/arrow-datafusion/issues/6349#issuecomment-1546903943

   As far as I can tell, sqllogictest in general, and `sqllogictest-rs` do not 
support column name checks right now.
   
   There are some ways to extend `sqllogictest-rs` to support columns. It seems 
that to add native support for column names one would need to do the following:
   1) add a `colnames` column check support to the `query` clause to optionally 
switch the check on/off, where off is the default. Column names could be the 
first row in the text representation
   ```
   query II rowsort colnames
   select 1 as one, 2 as two;
   ----
   one two
   1     2
   ```
   2) update 
[DBOutput](https://github.com/risinglightdb/sqllogictest-rs/blob/27eb9f50993e10b36c1f4f68ad3afe499adbbb49/sqllogictest/src/runner.rs#L37)
 and 
[RecordOutput](https://github.com/risinglightdb/sqllogictest-rs/blob/27eb9f50993e10b36c1f4f68ad3afe499adbbb49/sqllogictest/src/runner.rs#L23)
 to have a field `column_names`.
   3) update implementations of the `sqllogictest::AsyncDB` to return column 
names along with types and results
   4) introduce a validator for column names similar [to the type and result 
validator](https://github.com/risinglightdb/sqllogictest-rs/blob/27eb9f50993e10b36c1f4f68ad3afe499adbbb49/sqllogictest/src/runner.rs#L416-L450).
 A default implementation would probably be a no-op implementation to prevent 
behavior changes for library users other than Datafusion. A real implementation 
would likely just lowecase names and compare strings.
   
   -----------
   On the other hand, one could make comparisons work on the Datafusion side.
   
   It seems that the `EXPLAIN` statement already gives the alias names for 
projections, so it is possible to run the same query twice: once with EXPLAIN, 
once without. This way both names and values are checked. Of course, there is a 
lot more information than just names in an EXPLAIN output.
   
   Another way is to create a more powerful version of an `arrow_typeof`. For 
example, Databricks has a [DESCRIBE 
QUERY](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-aux-describe-query.html)
 and Snowflake a [DESCRIBE 
RESULT](https://docs.snowflake.com/en/sql-reference/sql/desc-result) that show 
the expected output metadata in format similar to
   ```
     +---------+------------+
     |col_name |data_type   |
     +---------+------------+
     |one      | int        |
     |two      | bigint     |
     +---------+------------+
   ```
   This way both specific Arrow types and names could be checked with one query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to