zabetak commented on code in PR #5978:
URL: https://github.com/apache/hive/pull/5978#discussion_r2207277470
##########
ql/src/test/queries/clientpositive/udtf_with_unionall.q:
##########
@@ -0,0 +1,55 @@
+SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date
'2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3)
+ UNION ALL
+ SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date
'2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3);
+
+EXPLAIN SELECT stack(3,'A',10,date '2015-01-01','z','B',20,date
'2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3)
+ UNION ALL
+ SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date
'2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3);
+
+SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1
+ UNION ALL
+ SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20,
'violet')) AS Colors2
+ UNION ALL
+ SELECT STACK(2,10,'X',20,'Y');
+
+EXPLAIN SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1
+ UNION ALL
+ SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20,
'violet')) AS Colors2
+ UNION ALL
+ SELECT STACK(2,10,'X',20,'Y');
+
+SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date
'2016-02-02')))
+ UNION ALL
+ SELECT STACK(2,'X',10,date '2017-01-01','Y',20,date '2018-01-01');
+
+EXPLAIN SELECT INLINE(array(struct('A',10,date
'2015-01-01'),struct('B',20,date '2016-02-02')))
+ UNION ALL
+ SELECT STACK(2,'X',30,date '2017-01-01','Y',40,date '2018-01-01');
+
+SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date
'2015-02-02')))
+ UNION ALL
+ SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date
'2016-02-02')));
+
+EXPLAIN SELECT INLINE(array(struct('A',10,date
'2015-01-01'),struct('B',20,date '2015-02-02')))
+ UNION ALL
+ SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date
'2016-02-02')));
+
+SELECT EXPLODE(map('A',10,'B',20,'C',30))
+ UNION ALL
+ SELECT EXPLODE(map('X',70,'Y',80,'Z',90));
Review Comment:
I like the fact that we are testing various UDTFs with UNION ALL but not
sure under what criteria we picked the combinations to test. Is it exhaustive?
Is it on most commonly used? Is it something else?
The fix is localized in a single rule so if we want to add test cases for
this purpose it feels more appropriate to do it via unit tests (see
Test*Rule.java classes) if that's feasible.
##########
ql/src/test/queries/clientpositive/udtf_with_unionall.q:
##########
@@ -0,0 +1,55 @@
+SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date
'2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3)
+ UNION ALL
+ SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date
'2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3);
+
+EXPLAIN SELECT stack(3,'A',10,date '2015-01-01','z','B',20,date
'2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3)
+ UNION ALL
+ SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date
'2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3);
+
+SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1
+ UNION ALL
+ SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20,
'violet')) AS Colors2
+ UNION ALL
+ SELECT STACK(2,10,'X',20,'Y');
+
+EXPLAIN SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1
+ UNION ALL
+ SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20,
'violet')) AS Colors2
+ UNION ALL
+ SELECT STACK(2,10,'X',20,'Y');
+
+SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date
'2016-02-02')))
+ UNION ALL
+ SELECT STACK(2,'X',10,date '2017-01-01','Y',20,date '2018-01-01');
+
+EXPLAIN SELECT INLINE(array(struct('A',10,date
'2015-01-01'),struct('B',20,date '2016-02-02')))
+ UNION ALL
+ SELECT STACK(2,'X',30,date '2017-01-01','Y',40,date '2018-01-01');
+
+SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date
'2015-02-02')))
+ UNION ALL
+ SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date
'2016-02-02')));
+
+EXPLAIN SELECT INLINE(array(struct('A',10,date
'2015-01-01'),struct('B',20,date '2015-02-02')))
+ UNION ALL
+ SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date
'2016-02-02')));
+
+SELECT EXPLODE(map('A',10,'B',20,'C',30))
+ UNION ALL
+ SELECT EXPLODE(map('X',70,'Y',80,'Z',90));
+
+EXPLAIN SELECT EXPLODE(map('A',10,'B',20,'C',30))
+ UNION ALL
+ SELECT EXPLODE(map('X',70,'Y',80,'Z',90));
+
+SELECT INLINE(array(struct('A',10,date '2015-01-01'),struct('B',20,date
'2015-02-02')))
+ UNION ALL
+ SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date
'2016-02-02')))
+ UNION ALL
+ SELECT STACK(2,'X',50,date '2017-01-01','Y',60,date '2017-01-01');
+
+EXPLAIN SELECT INLINE(array(struct('A',10,date
'2015-01-01'),struct('B',20,date '2015-02-02')))
+ UNION ALL
+ SELECT INLINE(array(struct('C',30,date '2016-01-01'),struct('D',40,date
'2016-02-02')))
+ UNION ALL
+ SELECT STACK(2,'X',50,date '2017-01-01','Y',60,date '2017-01-01');
Review Comment:
nit: No new line at the end of file.
##########
ql/src/test/queries/clientpositive/udtf_with_unionall.q:
##########
@@ -0,0 +1,55 @@
+SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date
'2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3)
+ UNION ALL
+ SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date
'2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3);
+
+EXPLAIN SELECT stack(3,'A',10,date '2015-01-01','z','B',20,date
'2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3)
+ UNION ALL
+ SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date
'2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3);
+
+SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1
+ UNION ALL
+ SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20,
'violet')) AS Colors2
Review Comment:
Do we need two `VALUES` branches to trigger the problem or one suffices? Do
we need three rows on each `VALUES`?
Whatever is not needed to repro the problem should be dropped. Keeping tests
minimal speeds-up reviews, improves readability, and minimizes flakiness in the
longterm.
##########
ql/src/test/queries/clientpositive/udtf_with_unionall.q:
##########
@@ -0,0 +1,55 @@
+SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date
'2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3)
+ UNION ALL
+ SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date
'2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3);
+
+EXPLAIN SELECT stack(3,'A',10,date '2015-01-01','z','B',20,date
'2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3)
+ UNION ALL
+ SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date
'2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3);
+
+SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1
+ UNION ALL
+ SELECT * FROM (VALUES(10, 'green'), (11, 'blue'), (12, 'indigo'), (20,
'violet')) AS Colors2
+ UNION ALL
+ SELECT STACK(2,10,'X',20,'Y');
+
+EXPLAIN SELECT * FROM (VALUES(1, '1'), (2, 'orange'), (5, 'yellow')) AS Colors1
Review Comment:
Since the problem is in CBO we don't care about the final (physical plan).
`EXPLAIN CBO` is sufficient, easier to read, less subject to changes due to
modifications in the compiler.
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveUnionSimpleSelectsToInlineTableRule.java:
##########
@@ -218,6 +218,9 @@ private boolean isInlineTableOperand(RelNode input) {
if (input.getInputs().size() == 0) {
return true;
}
+ if (!(((RexCall)((HiveTableFunctionScan)
input).getCall()).getOperands().get(0) instanceof RexCall)) {
+ return false;
+ }
Review Comment:
This check is not very readable and it also makes various assumptions about
the type and operands of `TableFunctionScan#getCall` that may lead to other
`ClassCastException` or other problems. Maybe we should consider adding a
helper function and putting all the necessary checks there. For instance:
```java
private static boolean isInlineArrayCall(RexNode expr) {
if (!(expr instanceof RexCall)) {
return false;
}
RexCall call = (RexCall) expr;
SqlOperator op = call.getOperator();
if (!"inline".equalsIgnoreCase(op.getName())) {
return false;
}
if (call.getOperands().size() != 1) {
return false;
}
RexNode operand = call.getOperands().get(0);
if (!(operand instanceof RexCall)) {
return false;
}
call = (RexCall) operand;
return "array".equalsIgnoreCase(call.getOperator().getName());
}
```
It is a bit verbose but not sure if there is a better way to put it :/
Other than that I am not sure if we should make the check here or rather in
Line 145-146 as you initially proposed.
##########
ql/src/test/queries/clientpositive/udtf_with_unionall.q:
##########
@@ -0,0 +1,55 @@
+SELECT STACK(3,'A',10,date '2015-01-01','z','B',20,date
'2016-01-01','y','C',30,date '2017-08-09','x') AS (col0,col1,col2,col3)
+ UNION ALL
+ SELECT STACK(3,'A',10,date '2015-01-01','n','B',20,date
'2016-01-01','m','C',30,date '2017-08-09','l') AS (col0,col1,col2,col3);
Review Comment:
The `ClassCastException` occurs during compilation so testing result
correctness is slightly out of scope for the current PR. It is probably fine to
add one end-to-end test but not sure if we need all of them. Does each test in
this file contribute to additional test coverage?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]