cloud-fan commented on code in PR #55984:
URL: https://github.com/apache/spark/pull/55984#discussion_r3287828550


##########
sql/core/src/test/resources/sql-tests/inputs/listagg.sql:
##########
@@ -61,3 +61,18 @@ SELECT listagg(DISTINCT col1) WITHIN GROUP (ORDER BY col1, 
col2) FROM df;
 SELECT listagg(DISTINCT col, ',') WITHIN GROUP (ORDER BY col) FROM VALUES 
(cast(1.1 as double)), (cast(2.2 as double)), (cast(2.2 as double)), (cast(3.3 
as double)) AS t(col);
 SELECT listagg(DISTINCT col, ',') WITHIN GROUP (ORDER BY col) FROM VALUES 
(cast(1.0 as float)), (cast(2.0 as float)), (cast(2.0 as float)) AS t(col);
 SELECT listagg(DISTINCT col, ',') WITHIN GROUP (ORDER BY col) FROM VALUES 
(TIMESTAMP'2024-01-01 10:00:00'), (TIMESTAMP'2024-01-02 12:00:00'), 
(TIMESTAMP'2024-01-01 10:00:00') AS t(col);
+
+-- LISTAGG with semi-structured extract (parser wraps v:a in Alias with fresh 
ExprId)
+-- Tests that isOrderCompatible strips Alias wrappers before comparing via 
semanticEquals
+SELECT listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string) 
FROM (SELECT parse_json('{"a": "x"}') v UNION ALL SELECT parse_json('{"a": 
"y"}') UNION ALL SELECT parse_json('{"a": "x"}'));
+-- Semi-structured extract without DISTINCT
+SELECT listagg(v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string) FROM 
(SELECT parse_json('{"a": "x"}') v UNION ALL SELECT parse_json('{"a": "y"}') 
UNION ALL SELECT parse_json('{"a": "x"}'));
+-- Semi-structured extract with DESC ordering
+SELECT listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string 
DESC) FROM (SELECT parse_json('{"a": "x"}') v UNION ALL SELECT 
parse_json('{"a": "y"}') UNION ALL SELECT parse_json('{"a": "x"}'));
+-- Semi-structured extract with nested path
+SELECT listagg(DISTINCT v:a.b::string, ',') WITHIN GROUP (ORDER BY 
v:a.b::string) FROM (SELECT parse_json('{"a": {"b": "x"}}') v UNION ALL SELECT 
parse_json('{"a": {"b": "y"}}') UNION ALL SELECT parse_json('{"a": {"b": 
"x"}}'));
+-- Semi-structured extract with GROUP BY
+SELECT grp, listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY 
v:a::string) FROM (SELECT 1 grp, parse_json('{"a": "x"}') v UNION ALL SELECT 1, 
parse_json('{"a": "y"}') UNION ALL SELECT 2, parse_json('{"a": "x"}') UNION ALL 
SELECT 2, parse_json('{"a": "x"}') UNION ALL SELECT 1, parse_json('{"a": 
"x"}')) GROUP BY grp;
+-- Semi-structured extract: DISTINCT cast with non-equality-preserving order 
(variant)
+-- Tests that checkOrderValueDeterminism strips Alias wrappers before 
comparing via semanticEquals
+SELECT listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a) FROM 
(SELECT parse_json('{"a": "x"}') v UNION ALL SELECT parse_json('{"a": "y"}') 
UNION ALL SELECT parse_json('{"a": "x"}'));

Review Comment:
   The comment above says this exercises `checkOrderValueDeterminism`'s alias 
stripping, but `ORDER BY v:a` is VARIANT and fails with 
`DATATYPE_MISMATCH.INVALID_ORDERING_TYPE` from `SortOrder.checkInputDataTypes` 
before listagg's checks run — see this query's `analyzer-results` golden 
output: only that error is raised, no 
`functionAndOrderExpressionUnsafeCastError`. The test would pass identically 
with or without the new `trimAliases` you added at line 684.
   
   To actually hit the `Cast(castChild, ...)` arm in 
`checkOrderValueDeterminism`, the order column needs to be orderable but 
non-equality-preserving when cast to String, e.g. `LISTAGG(DISTINCT 
(v:a)::double::string, ',') WITHIN GROUP (ORDER BY (v:a)::double)`. Without the 
trim, the single-pass aliases inside `castChild` and 
`orderExpressions.head.child` mismatch and you'd get `NonDeterministicMismatch` 
→ `functionAndOrderExpressionMismatchError`; with the trim, they match and you 
get `NonDeterministicCast(Double, String)` → 
`functionAndOrderExpressionUnsafeCastError`. That divergence is what pins the 
fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to