[jira] [Commented] (HIVE-4598) Incorrect results when using subquery in multi table insert

JIRA Mon, 30 Sep 2013 14:31:11 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782271#comment-13782271
 ]


Johannes Alkjær commented on HIVE-4598:
---------------------------------------

I can trigger the problem when using a reducer script in the subquery (Hive 
0.11.0) 

{code}
CREATE TABLE sample ( key string, val string);

EXPLAIN
FROM (
    FROM ( SELECT * FROM sample ) mapout  REDUCE * USING 'cat' AS x,y
) reduced
insert overwrite local directory '/tmp/a' select * where x='a' or x='b'
insert overwrite local directory '/tmp/b' select * where x='c' or x='d';
{code}

{code}
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_SUBQUERY 
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME sample))) (TOK_INSERT 
(TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
TOK_ALLCOLREF)))) mapout)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) 
(TOK_SELECT (TOK_SELEXPR (TOK_TRANSFORM (TOK_EXPLIST TOK_ALLCOLREF) TOK_SERDE 
TOK_RECORDWRITER 'cat' TOK_SERDE TOK_RECORDREADER (TOK_ALIASLIST x y)))))) 
reduced)) (TOK_INSERT (TOK_DESTINATION (TOK_LOCAL_DIR '/tmp/a')) (TOK_SELECT 
(TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (= (TOK_TABLE_OR_COL x) 'a') (= 
(TOK_TABLE_OR_COL x) 'b')))) (TOK_INSERT (TOK_DESTINATION (TOK_LOCAL_DIR 
'/tmp/b')) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (= 
(TOK_TABLE_OR_COL x) 'c') (= (TOK_TABLE_OR_COL x) 'd')))))

STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-0 depends on stages: Stage-2
  Stage-1 depends on stages: Stage-2

STAGE PLANS:
  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        reduced:mapout:sample 
          TableScan
            alias: sample
            Select Operator
              expressions:
                    expr: key
                    type: string
                    expr: val
                    type: string
              outputColumnNames: _col0, _col1
              Transform Operator
                command: cat
                output info:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                Filter Operator
                  predicate:
                      expr: (((_col0 = 'a') or (_col0 = 'b')) and ((_col0 = 
'c') or (_col0 = 'd')))
                      type: boolean
                  Select Operator
                    expressions:
                          expr: _col0
                          type: string
                          expr: _col1
                          type: string
                    outputColumnNames: _col0, _col1
                    File Output Operator
                      compressed: false
                      GlobalTableId: 1
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  Select Operator
                    expressions:
                          expr: _col0
                          type: string
                          expr: _col1
                          type: string
                    outputColumnNames: _col0, _col1
                    File Output Operator
                      compressed: false
                      GlobalTableId: 2
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Move Operator
      files:
          hdfs directory: false
          destination: /tmp/a

  Stage: Stage-1
    Move Operator
      files:
          hdfs directory: false
          destination: /tmp/b
{code}


> Incorrect results when using subquery in multi table insert
> -----------------------------------------------------------
>
>                 Key: HIVE-4598
>                 URL: https://issues.apache.org/jira/browse/HIVE-4598
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.10.0, 0.11.0
>            Reporter: Sebastian
>
> I'm using a multi table insert like this:
> FROM <x>
> INSERT INTO TABLE t PARTITION (type='x')
> SELECT * WHERE type='x'
> INSERT INTO TABLE t PARTITION (type='y')
> SELECT * WHERE type='y';
> Now when <x> is the name of a table, everything works as expected.
> However if I use a subquery as <x>, the query runs but it inserts all results 
> from the subquery into each partition, as if there were no "WHERE" clauses in 
> the selects.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-4598) Incorrect results when using subquery in multi table insert

Reply via email to