GitHub user 0eu created a discussion: Refactor Pinot sqlglot dialect tests to 
use AST comparison instead of sub-string matching

**Background**

Currently, many of Pinot dialect-specific tests rely on direct string 
comparisons or substring checks to verify that SQL is being parsed and 
generated correctly.

Example:
```python
def test_cast_to_string() -> None:
    """
    Test that CAST to STRING is preserved (not converted to CHAR).
    """
    sql = "SELECT CAST(cohort_size AS STRING) FROM table"
    ast = sqlglot.parse_one(sql, Pinot)
    generated = Pinot().generate(expression=ast)

    assert "STRING" in generated
    assert "CHAR" not in generated
```

**Problem**
String-based assertions are highly sensitive to:
- Small changes in the sqlglot generator's "pretty print" logic can break the 
entire test suite.
- Dialect-specific normalization (upper vs. lower case) can lead to false 
negatives.
- We are often testing sqlglot's formatting engine rather than the semantic 
correctness of the dialect translation.
- Last, but not least, these tests are a bit more difficult to read because 
these substring checks are difficult to reason about.

**Proposed Solution**
I propose refactoring these tests to leverage sqlglot’s ability to compare 
expressions directly. By comparing the resulting AST of the generated SQL 
against an expected SQL string's AST, we ensure semantic equivalence without 
being tied to a specific string format.

Example:
```python
# Instead of comparing strings:
# assert generated_string == expected_string

# We compare ASTs:
expected_ast = sqlglot.parse_one(expected_sql, read=Pinot)
generated_ast = sqlglot.parse_one(generated_sql, read=Pinot)

assert generated_ast == expected_ast
```

**I’m volunteering to work on this!**
I have already analyzed the existing 
[test_pinot](https://github.com/apache/superset/blob/28c802fb6c7f6cef7d22d4793f4e5bfe0d16cbd5/tests/unit_tests/sql/dialects/pinot_tests.py#L505)
 and am ready to submit a PR implementing this pattern.

GitHub link: https://github.com/apache/superset/discussions/36833

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to