scarlin-cloudera commented on code in PR #4783: URL: https://github.com/apache/hive/pull/4783#discussion_r1355474928
########## ql/src/test/results/clientpositive/llap/multi_insert_gby5.q.out: ########## @@ -0,0 +1,250 @@ +PREHOOK: query: CREATE TABLE tbl1 (key int, f1 int) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl1 +POSTHOOK: query: CREATE TABLE tbl1 (key int, f1 int) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl1 +PREHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int) +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@tbl2 +POSTHOOK: query: CREATE TABLE tbl2 (f1 int) PARTITIONED BY (key int) +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@tbl2 +PREHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a +INSERT OVERWRITE TABLE tbl2 PARTITION(key=5) +SELECT f1 WHERE key > 0 GROUP BY f1 +INSERT OVERWRITE TABLE tbl2 partition(key=6) +SELECT f1 WHERE key > 0 GROUP BY f1 +PREHOOK: type: QUERY +PREHOOK: Input: default@tbl1 +PREHOOK: Output: default@tbl2@key=5 +PREHOOK: Output: default@tbl2@key=6 +POSTHOOK: query: EXPLAIN FROM (SELECT key, f1 FROM tbl1 WHERE key=5) a +INSERT OVERWRITE TABLE tbl2 PARTITION(key=5) +SELECT f1 WHERE key > 0 GROUP BY f1 +INSERT OVERWRITE TABLE tbl2 partition(key=6) +SELECT f1 WHERE key > 0 GROUP BY f1 +POSTHOOK: type: QUERY +POSTHOOK: Input: default@tbl1 +POSTHOOK: Output: default@tbl2@key=5 +POSTHOOK: Output: default@tbl2@key=6 +STAGE DEPENDENCIES: + Stage-2 is a root stage + Stage-3 depends on stages: Stage-2 + Stage-0 depends on stages: Stage-3 + Stage-4 depends on stages: Stage-0 + Stage-1 depends on stages: Stage-3 + Stage-5 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-2 + Tez +#### A masked pattern was here #### + Edges: + Reducer 2 <- Map 1 (SIMPLE_EDGE) + Reducer 3 <- Reducer 2 (SIMPLE_EDGE) + Reducer 4 <- Reducer 2 (SIMPLE_EDGE) +#### A masked pattern was here #### + Vertices: + Map 1 + Map Operator Tree: + TableScan + alias: tbl1 + filterExpr: (key = 5) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (key = 5) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Select Operator + expressions: f1 (type: int) + outputColumnNames: _col1 + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (5 > 0) (type: boolean) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator + key expressions: _col1 (type: int) + null sort order: z + sort order: + + Map-reduce partition columns: _col1 (type: int) + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + value expressions: 5 (type: int) + Execution mode: vectorized, llap + LLAP IO: all inputs + Reducer 2 + Execution mode: llap + Reduce Operator Tree: + Forward + Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE + Filter Operator + predicate: (VALUE._col0 > 0) (type: boolean) Review Comment: Same comment above applies here. There is no CBO applied to the INSERT OVERWRITE portion. You're right that it should be stripped out, but that would entail a bigger fix. In fact, I would go so far as to say that I have not really come up with an example that truly makes sense from an end-user point of view. The bug only occurs where there is constant folding happening in the CBO portion and there is a filter being applied on the same column in the INSERT OVERWRITE. From a technical point of view, the query is correct. From a practical point of view, the query doesn't make sense. A customer did hit this issue, and it turned out their query was incorrect. They did request that this should get fixed. I'm actually almost tempted to let the bug exist because if the query isn't something the end user would want, perhaps they should get the failure at compile time than at runtime? But the purist in me, says that a correct query should go through, which is why I think we should still fix this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org