[
https://issues.apache.org/jira/browse/HIVE-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782271#comment-13782271
]
Johannes Alkjær commented on HIVE-4598:
---------------------------------------
I can trigger the problem when using a reducer script in the subquery (Hive
0.11.0)
{code}
CREATE TABLE sample ( key string, val string);
EXPLAIN
FROM (
FROM ( SELECT * FROM sample ) mapout REDUCE * USING 'cat' AS x,y
) reduced
insert overwrite local directory '/tmp/a' select * where x='a' or x='b'
insert overwrite local directory '/tmp/b' select * where x='c' or x='d';
{code}
{code}
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_SUBQUERY
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME sample))) (TOK_INSERT
(TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR
TOK_ALLCOLREF)))) mapout)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE))
(TOK_SELECT (TOK_SELEXPR (TOK_TRANSFORM (TOK_EXPLIST TOK_ALLCOLREF) TOK_SERDE
TOK_RECORDWRITER 'cat' TOK_SERDE TOK_RECORDREADER (TOK_ALIASLIST x y))))))
reduced)) (TOK_INSERT (TOK_DESTINATION (TOK_LOCAL_DIR '/tmp/a')) (TOK_SELECT
(TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (= (TOK_TABLE_OR_COL x) 'a') (=
(TOK_TABLE_OR_COL x) 'b')))) (TOK_INSERT (TOK_DESTINATION (TOK_LOCAL_DIR
'/tmp/b')) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (=
(TOK_TABLE_OR_COL x) 'c') (= (TOK_TABLE_OR_COL x) 'd')))))
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-0 depends on stages: Stage-2
Stage-1 depends on stages: Stage-2
STAGE PLANS:
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
reduced:mapout:sample
TableScan
alias: sample
Select Operator
expressions:
expr: key
type: string
expr: val
type: string
outputColumnNames: _col0, _col1
Transform Operator
command: cat
output info:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Filter Operator
predicate:
expr: (((_col0 = 'a') or (_col0 = 'b')) and ((_col0 =
'c') or (_col0 = 'd')))
type: boolean
Select Operator
expressions:
expr: _col0
type: string
expr: _col1
type: string
outputColumnNames: _col0, _col1
File Output Operator
compressed: false
GlobalTableId: 1
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Select Operator
expressions:
expr: _col0
type: string
expr: _col1
type: string
outputColumnNames: _col0, _col1
File Output Operator
compressed: false
GlobalTableId: 2
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Move Operator
files:
hdfs directory: false
destination: /tmp/a
Stage: Stage-1
Move Operator
files:
hdfs directory: false
destination: /tmp/b
{code}
> Incorrect results when using subquery in multi table insert
> -----------------------------------------------------------
>
> Key: HIVE-4598
> URL: https://issues.apache.org/jira/browse/HIVE-4598
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0, 0.11.0
> Reporter: Sebastian
>
> I'm using a multi table insert like this:
> FROM <x>
> INSERT INTO TABLE t PARTITION (type='x')
> SELECT * WHERE type='x'
> INSERT INTO TABLE t PARTITION (type='y')
> SELECT * WHERE type='y';
> Now when <x> is the name of a table, everything works as expected.
> However if I use a subquery as <x>, the query runs but it inserts all results
> from the subquery into each partition, as if there were no "WHERE" clauses in
> the selects.
--
This message was sent by Atlassian JIRA
(v6.1#6144)