[GitHub] [arrow-datafusion] melgenek commented on issue #6349: Sqllogictests doesn't cover cases if the column name is not expected.

via GitHub Sun, 14 May 2023 06:44:23 -0700


melgenek commented on issue #6349:
URL: 
https://github.com/apache/arrow-datafusion/issues/6349#issuecomment-1546903943

As far as I can tell, sqllogictest in general, and `sqllogictest-rs` do not
support column name checks right now.

There are some ways to extend `sqllogictest-rs` to support columns. It seems
that to add native support for column names one would need to do the following:
1) add a `colnames` column check support to the `query` clause to optionally
switch the check on/off, where off is the default. Column names could be the
first row in the text representation
```
query II rowsort colnames
select 1 as one, 2 as two;
----
one two
1 2
```
2) update
[DBOutput](https://github.com/risinglightdb/sqllogictest-rs/blob/27eb9f50993e10b36c1f4f68ad3afe499adbbb49/sqllogictest/src/runner.rs#L37)
and
[RecordOutput](https://github.com/risinglightdb/sqllogictest-rs/blob/27eb9f50993e10b36c1f4f68ad3afe499adbbb49/sqllogictest/src/runner.rs#L23)
to have a field `column_names`.
3) update implementations of the `sqllogictest::AsyncDB` to return column
names along with types and results
4) introduce a validator for column names similar [to the type and result
validator](https://github.com/risinglightdb/sqllogictest-rs/blob/27eb9f50993e10b36c1f4f68ad3afe499adbbb49/sqllogictest/src/runner.rs#L416-L450).
A default implementation would probably be a no-op implementation to prevent
behavior changes for library users other than Datafusion. A real implementation
would likely just lowecase names and compare strings.

-----------
On the other hand, one could make comparisons work on the Datafusion side.

It seems that the `EXPLAIN` statement already gives the alias names for
projections, so it is possible to run the same query twice: once with EXPLAIN,
once without. This way both names and values are checked. Of course, there is a
lot more information than just names in an EXPLAIN output.

Another way is to create a more powerful version of an `arrow_typeof`. For
example, Databricks has a [DESCRIBE
QUERY](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-aux-describe-query.html)
and Snowflake a [DESCRIBE
RESULT](https://docs.snowflake.com/en/sql-reference/sql/desc-result) that show
the expected output metadata in format similar to
```
+---------+------------+
|col_name |data_type |
+---------+------------+
|one | int |
|two | bigint |
+---------+------------+
```
This way both specific Arrow types and names could be checked with one query.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] melgenek commented on issue #6349: Sqllogictests doesn't cover cases if the column name is not expected.

Reply via email to