[ 
https://issues.apache.org/jira/browse/HIVE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6395:
----------------------------

    Description: 
{noformat}
set hive.optimize.ppd=true;
add file ./test.py;

from (select transform(test.*) using 'python ./test.py'
as id,name,state from test) t0
insert overwrite table test2 select * where state=1
insert overwrite table test3 select * where state=2;
{noformat}

In the above example, the select transform returns an extra column, and that 
column is used in where clause of the multi-insert selects.  However, if 
optimize is on, the query plan is wrong:

filter (state=1 and state=2) //impossible
--> select, insert into test1
--> select, insert into test2

The correct query plan for hive.optimize.ppd=false is:
filter (state=1)
--> select, insert into test1
filter (state=2)
--> select, insert into test2

For reference
{noformat}
create table test (id int, name string)
create table test2(id int, name string, state int)
create table test3(id int, name string, state int)
{noformat}

  was:
{code}
set hive.optimize.ppd=true;
add file ./test.py;

from (select transform(test.*) using 'python ./test.py'
as id,name,state from test) t0
insert overwrite table test2 select * where state=1
insert overwrite table test3 select * where state=2;
{code}

In the above example, the select transform returns an extra column, and that 
column is used in where clause of the multi-insert selects.  However, if 
optimize is on, the query plan is wrong:

filter (state=1 and state=2) //impossible
--> select, insert into test1
--> select, insert into test2

The correct query plan for hive.optimize.ppd=false is:
filter (state=1)
--> select, insert into test1
filter (state=2)
--> select, insert into test2


> multi-table insert from select transform fails if optimize.ppd enabled
> ----------------------------------------------------------------------
>
>                 Key: HIVE-6395
>                 URL: https://issues.apache.org/jira/browse/HIVE-6395
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Szehon Ho
>         Attachments: test.py
>
>
> {noformat}
> set hive.optimize.ppd=true;
> add file ./test.py;
> from (select transform(test.*) using 'python ./test.py'
> as id,name,state from test) t0
> insert overwrite table test2 select * where state=1
> insert overwrite table test3 select * where state=2;
> {noformat}
> In the above example, the select transform returns an extra column, and that 
> column is used in where clause of the multi-insert selects.  However, if 
> optimize is on, the query plan is wrong:
> filter (state=1 and state=2) //impossible
> --> select, insert into test1
> --> select, insert into test2
> The correct query plan for hive.optimize.ppd=false is:
> filter (state=1)
> --> select, insert into test1
> filter (state=2)
> --> select, insert into test2
> For reference
> {noformat}
> create table test (id int, name string)
> create table test2(id int, name string, state int)
> create table test3(id int, name string, state int)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to