zabetak commented on code in PR #5978: URL: https://github.com/apache/hive/pull/5978#discussion_r2207277470
########## ql/src/test/queries/clientpositive/udtf_with_unionall.q: ########## @@ -0,0 +1,55 @@ +SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date '2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3) + UNION ALL + SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date '2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3); + +EXPLAIN SELECT stack(3,'A',10,date '2015-01-01','z','B',20,date '2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3) + UNION ALL + SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date '2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3); + +SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1 + UNION ALL + SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20, 'violet')) AS Colors2 + UNION ALL + SELECT STACK(2,10,'X',20,'Y'); + +EXPLAIN SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1 + UNION ALL + SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20, 'violet')) AS Colors2 + UNION ALL + SELECT STACK(2,10,'X',20,'Y'); + +SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) + UNION ALL + SELECT STACK(2,'X',10,date '2017-01-01','Y',20,date '2018-01-01'); + +EXPLAIN SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) + UNION ALL + SELECT STACK(2,'X',30,date '2017-01-01','Y',40,date '2018-01-01'); + +SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2015-02-02'))) + UNION ALL + SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date '2016-02-02'))); + +EXPLAIN SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2015-02-02'))) + UNION ALL + SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date '2016-02-02'))); + +SELECT EXPLODE(map('A',10,'B',20,'C',30)) + UNION ALL + SELECT EXPLODE(map('X',70,'Y',80,'Z',90)); Review Comment: I like the fact that we are testing various UDTFs with UNION ALL but not sure under what criteria we picked the combinations to test. Is it exhaustive? Is it on most commonly used? Is it something else? The fix is localized in a single rule so if we want to add test cases for this purpose it feels more appropriate to do it via unit tests (see Test*Rule.java classes) if that's feasible. ########## ql/src/test/queries/clientpositive/udtf_with_unionall.q: ########## @@ -0,0 +1,55 @@ +SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date '2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3) + UNION ALL + SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date '2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3); + +EXPLAIN SELECT stack(3,'A',10,date '2015-01-01','z','B',20,date '2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3) + UNION ALL + SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date '2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3); + +SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1 + UNION ALL + SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20, 'violet')) AS Colors2 + UNION ALL + SELECT STACK(2,10,'X',20,'Y'); + +EXPLAIN SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1 + UNION ALL + SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20, 'violet')) AS Colors2 + UNION ALL + SELECT STACK(2,10,'X',20,'Y'); + +SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) + UNION ALL + SELECT STACK(2,'X',10,date '2017-01-01','Y',20,date '2018-01-01'); + +EXPLAIN SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) + UNION ALL + SELECT STACK(2,'X',30,date '2017-01-01','Y',40,date '2018-01-01'); + +SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2015-02-02'))) + UNION ALL + SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date '2016-02-02'))); + +EXPLAIN SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2015-02-02'))) + UNION ALL + SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date '2016-02-02'))); + +SELECT EXPLODE(map('A',10,'B',20,'C',30)) + UNION ALL + SELECT EXPLODE(map('X',70,'Y',80,'Z',90)); + +EXPLAIN SELECT EXPLODE(map('A',10,'B',20,'C',30)) + UNION ALL + SELECT EXPLODE(map('X',70,'Y',80,'Z',90)); + +SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2015-02-02'))) + UNION ALL + SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date '2016-02-02'))) + UNION ALL + SELECT STACK(2,'X',50,date '2017-01-01','Y',60,date '2017-01-01'); + +EXPLAIN SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2015-02-02'))) + UNION ALL + SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date '2016-02-02'))) + UNION ALL + SELECT STACK(2,'X',50,date '2017-01-01','Y',60,date '2017-01-01'); Review Comment: nit: No new line at the end of file. ########## ql/src/test/queries/clientpositive/udtf_with_unionall.q: ########## @@ -0,0 +1,55 @@ +SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date '2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3) + UNION ALL + SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date '2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3); + +EXPLAIN SELECT stack(3,'A',10,date '2015-01-01','z','B',20,date '2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3) + UNION ALL + SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date '2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3); + +SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1 + UNION ALL + SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20, 'violet')) AS Colors2 Review Comment: Do we need two `VALUES` branches to trigger the problem or one suffices? Do we need three rows on each `VALUES`? Whatever is not needed to repro the problem should be dropped. Keeping tests minimal speeds-up reviews, improves readability, and minimizes flakiness in the longterm. ########## ql/src/test/queries/clientpositive/udtf_with_unionall.q: ########## @@ -0,0 +1,55 @@ +SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date '2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3) + UNION ALL + SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date '2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3); + +EXPLAIN SELECT stack(3,'A',10,date '2015-01-01','z','B',20,date '2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3) + UNION ALL + SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date '2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3); + +SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1 + UNION ALL + SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20, 'violet')) AS Colors2 + UNION ALL + SELECT STACK(2,10,'X',20,'Y'); + +EXPLAIN SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1 Review Comment: Since the problem is in CBO we don't care about the final (physical plan). `EXPLAIN CBO` is sufficient, easier to read, less subject to changes due to modifications in the compiler. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveUnionSimpleSelectsToInlineTableRule.java: ########## @@ -218,6 +218,9 @@ private boolean isInlineTableOperand(RelNode input) { if (input.getInputs().size() == 0) { return true; } + if (!(((RexCall)((HiveTableFunctionScan) input).getCall()).getOperands().get(0) instanceof RexCall)) { + return false; + } Review Comment: This check is not very readable and it also makes various assumptions about the type and operands of `TableFunctionScan#getCall` that may lead to other `ClassCastException` or other problems. Maybe we should consider adding a helper function and putting all the necessary checks there. For instance: ```java private static boolean isInlineArrayCall(RexNode expr) { if (!(expr instanceof RexCall)) { return false; } RexCall call = (RexCall) expr; SqlOperator op = call.getOperator(); if (!"inline".equalsIgnoreCase(op.getName())) { return false; } if (call.getOperands().size() != 1) { return false; } RexNode operand = call.getOperands().get(0); if (!(operand instanceof RexCall)) { return false; } call = (RexCall) operand; return "array".equalsIgnoreCase(call.getOperator().getName()); } ``` It is a bit verbose but not sure if there is a better way to put it :/ Other than that I am not sure if we should make the check here or rather in Line 145-146 as you initially proposed. ########## ql/src/test/queries/clientpositive/udtf_with_unionall.q: ########## @@ -0,0 +1,55 @@ +SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date '2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3) + UNION ALL + SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date '2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3); Review Comment: The `ClassCastException` occurs during compilation so testing result correctness is slightly out of scope for the current PR. It is probably fine to add one end-to-end test but not sure if we need all of them. Does each test in this file contribute to additional test coverage? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org