[jira] [Updated] (HIVE-21950) Optimizer complicates execution plan of queries with SUBSTR function in EXISTS clause

2019-07-03 Thread Murshid Chalaev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Murshid Chalaev updated HIVE-21950:
---
Description: 
Queries with SUBSTR function in EXISTS clause have much more complicated 
execution plan in Hive-2.3 with enabled CBO then with disabled. A query below 
has 8 stages which submit 4 MR jobs in Hive-2.3 when CBO is enabled, while with 
disabled it has 4 stages and submits 1 MR job.

 *STEPS TO REPRODUCE:*
{code:java}
CREATE TABLE i1122 (id STRING);
INSERT INTO i1122 VALUES (1),(1001); 

EXPLAIN
SELECT *
FROM i1122 AS t1
WHERE EXISTS (
SELECT 1
FROM i1122 AS t2
WHERE t2.id = substr(t1.id,4)
);{code}

*ACTUAL RESULT:*
Explain plan in Hive-2.3 with disabled CBO(The same execution plan was in 
Hive-1.2 with enabled CBO):
{code:java}
hive> SET hive.cbo.enable=false;
hive> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
> SELECT 1
> FROM i1122 AS t2
> WHERE t2.id = substr(t1.id,4)
> );
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
...
{code}

Explain plan in Hive-2.3 with enabled CBO:

{code:java}
hive> SET hive.cbo.enable=true;
hive> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
> SELECT 1
> FROM i1122 AS t2
> WHERE t2.id = substr(t1.id,4)
> );
OK
STAGE DEPENDENCIES:
  Stage-3 is a root stage
  Stage-8 depends on stages: Stage-3
  Stage-6 depends on stages: Stage-8
  Stage-5 depends on stages: Stage-6 , consists of Stage-7, Stage-1
  Stage-7 has a backup stage: Stage-1
  Stage-4 depends on stages: Stage-7
  Stage-1
  Stage-0 depends on stages: Stage-4, Stage-1

STAGE PLANS:
...
{code}

Full explain plans you can find in attachment

  was:
Queries with SUBSTR function in EXISTS clause have much more complicated 
execution plan in Hive-2.3 with enabled CBO then in Hive-2.3 with disabled CBO. 
A query below has 8 stages which submit 4 MR jobs in Hive-2.3 with enabled CBO, 
while with disabled CBO it has 4 stages and submits 1 MR job.

 *STEPS TO REPRODUCE:*
{code:java}
CREATE TABLE i1122 (id STRING);
INSERT INTO i1122 VALUES (1),(1001); 

EXPLAIN
SELECT *
FROM i1122 AS t1
WHERE EXISTS (
SELECT 1
FROM i1122 AS t2
WHERE t2.id = substr(t1.id,4)
);{code}

*ACTUAL RESULT:*
Explain plan in Hive-2.3 with disabled CBO(The same execution plan was in 
Hive-1.2 with enabled CBO):
{code:java}
hive> SET hive.cbo.enable=false;
hive> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
> SELECT 1
> FROM i1122 AS t2
> WHERE t2.id = substr(t1.id,4)
> );
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
...
{code}

Explain plan in Hive-2.3 with enabled CBO:

{code:java}
hive> SET hive.cbo.enable=true;
hive> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
> SELECT 1
> FROM i1122 AS t2
> WHERE t2.id = substr(t1.id,4)
> );
OK
STAGE DEPENDENCIES:
  Stage-3 is a root stage
  Stage-8 depends on stages: Stage-3
  Stage-6 depends on stages: Stage-8
  Stage-5 depends on stages: Stage-6 , consists of Stage-7, Stage-1
  Stage-7 has a backup stage: Stage-1
  Stage-4 depends on stages: Stage-7
  Stage-1
  Stage-0 depends on stages: Stage-4, Stage-1

STAGE PLANS:
...
{code}

Full explain plans you can find in attachment


> Optimizer complicates execution plan of queries with SUBSTR function in 
> EXISTS clause
> -
>
> Key: HIVE-21950
> URL: https://issues.apache.org/jira/browse/HIVE-21950
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.1.1
>Reporter: Murshid Chalaev
>Priority: Minor
> Attachments: disabled_CBO.txt, enabled_CBO.txt
>
>
> Queries with SUBSTR function in EXISTS clause have much more complicated 
> execution plan in Hive-2.3 with enabled CBO then with disabled. A query below 
> has 8 stages which submit 4 MR jobs in Hive-2.3 when CBO is enabled, while 
> with disabled it has 4 stages and submits 1 MR job.
>  *STEPS TO REPRODUCE:*
> {code:java}
> CREATE TABLE i1122 (id STRING);
> INSERT INTO i1122 VALUES (1),(1001); 
> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
> SELECT 1
> FROM i1122 AS t2
> WHERE t2.id = substr(t1.id,4)
> );{code}
> *ACTUAL RESULT:*
> Explain plan in Hive-2.3 with disabled CBO(The same execution plan was in 
> Hive-1.2 with enabled CBO):
> {code:java}
> hive> SET hive.cbo.enable=false;
> hive> EXPLAIN
> > SELECT *
> > FROM i1122 AS t1
> > WHERE EXISTS (
> > SELECT 1
> > FROM i1122 AS t2
> > WHERE t2.id = substr(t1.id,4)
> > );
> OK
> STAGE DEPENDENCIES:
>   Stage-4 is a root 

[jira] [Updated] (HIVE-21950) Optimizer complicates execution plan of queries with SUBSTR function in EXISTS clause

2019-07-03 Thread Murshid Chalaev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Murshid Chalaev updated HIVE-21950:
---
Attachment: enabled_CBO.txt

> Optimizer complicates execution plan of queries with SUBSTR function in 
> EXISTS clause
> -
>
> Key: HIVE-21950
> URL: https://issues.apache.org/jira/browse/HIVE-21950
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.1.1
>Reporter: Murshid Chalaev
>Priority: Minor
> Attachments: disabled_CBO.txt, enabled_CBO.txt
>
>
> Queries with SUBSTR function in EXISTS clause have much more complicated 
> execution plan in Hive-2.3 with enabled CBO then in Hive-2.3 with disabled 
> CBO. A query below has 8 stages which submit 4 MR jobs in Hive-2.3 with 
> enabled CBO, while with disabled CBO it has 4 stages and submits 1 MR job.
>  *STEPS TO REPRODUCE:*
> {code:java}
> CREATE TABLE i1122 (id STRING);
> INSERT INTO i1122 VALUES (1),(1001); 
> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
> SELECT 1
> FROM i1122 AS t2
> WHERE t2.id = substr(t1.id,4)
> );{code}
> *ACTUAL RESULT:*
> Explain plan in Hive-2.3 with disabled CBO(The same execution plan was in 
> Hive-1.2 with enabled CBO):
> {code:java}
> hive> SET hive.cbo.enable=false;
> hive> EXPLAIN
> > SELECT *
> > FROM i1122 AS t1
> > WHERE EXISTS (
> > SELECT 1
> > FROM i1122 AS t2
> > WHERE t2.id = substr(t1.id,4)
> > );
> OK
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> STAGE PLANS:
> ...
> {code}
> Explain plan in Hive-2.3 with enabled CBO:
> {code:java}
> hive> SET hive.cbo.enable=true;
> hive> EXPLAIN
> > SELECT *
> > FROM i1122 AS t1
> > WHERE EXISTS (
> > SELECT 1
> > FROM i1122 AS t2
> > WHERE t2.id = substr(t1.id,4)
> > );
> OK
> STAGE DEPENDENCIES:
>   Stage-3 is a root stage
>   Stage-8 depends on stages: Stage-3
>   Stage-6 depends on stages: Stage-8
>   Stage-5 depends on stages: Stage-6 , consists of Stage-7, Stage-1
>   Stage-7 has a backup stage: Stage-1
>   Stage-4 depends on stages: Stage-7
>   Stage-1
>   Stage-0 depends on stages: Stage-4, Stage-1
> STAGE PLANS:
> ...
> {code}
> Full explain plans you can find in attachment



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21950) Optimizer complicates execution plan of queries with SUBSTR function in EXISTS clause

2019-07-03 Thread Murshid Chalaev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Murshid Chalaev updated HIVE-21950:
---
Attachment: disabled_CBO.txt

> Optimizer complicates execution plan of queries with SUBSTR function in 
> EXISTS clause
> -
>
> Key: HIVE-21950
> URL: https://issues.apache.org/jira/browse/HIVE-21950
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.1.1
>Reporter: Murshid Chalaev
>Priority: Minor
> Attachments: disabled_CBO.txt, enabled_CBO.txt
>
>
> Queries with SUBSTR function in EXISTS clause have much more complicated 
> execution plan in Hive-2.3 with enabled CBO then in Hive-2.3 with disabled 
> CBO. A query below has 8 stages which submit 4 MR jobs in Hive-2.3 with 
> enabled CBO, while with disabled CBO it has 4 stages and submits 1 MR job.
>  *STEPS TO REPRODUCE:*
> {code:java}
> CREATE TABLE i1122 (id STRING);
> INSERT INTO i1122 VALUES (1),(1001); 
> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
> SELECT 1
> FROM i1122 AS t2
> WHERE t2.id = substr(t1.id,4)
> );{code}
> *ACTUAL RESULT:*
> Explain plan in Hive-2.3 with disabled CBO(The same execution plan was in 
> Hive-1.2 with enabled CBO):
> {code:java}
> hive> SET hive.cbo.enable=false;
> hive> EXPLAIN
> > SELECT *
> > FROM i1122 AS t1
> > WHERE EXISTS (
> > SELECT 1
> > FROM i1122 AS t2
> > WHERE t2.id = substr(t1.id,4)
> > );
> OK
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> STAGE PLANS:
> ...
> {code}
> Explain plan in Hive-2.3 with enabled CBO:
> {code:java}
> hive> SET hive.cbo.enable=true;
> hive> EXPLAIN
> > SELECT *
> > FROM i1122 AS t1
> > WHERE EXISTS (
> > SELECT 1
> > FROM i1122 AS t2
> > WHERE t2.id = substr(t1.id,4)
> > );
> OK
> STAGE DEPENDENCIES:
>   Stage-3 is a root stage
>   Stage-8 depends on stages: Stage-3
>   Stage-6 depends on stages: Stage-8
>   Stage-5 depends on stages: Stage-6 , consists of Stage-7, Stage-1
>   Stage-7 has a backup stage: Stage-1
>   Stage-4 depends on stages: Stage-7
>   Stage-1
>   Stage-0 depends on stages: Stage-4, Stage-1
> STAGE PLANS:
> ...
> {code}
> Full explain plans you can find in attachment



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21950) Optimizer complicates execution plan of queries with SUBSTR function in EXISTS clause

2019-07-03 Thread Murshid Chalaev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Murshid Chalaev updated HIVE-21950:
---
Description: 
Queries with SUBSTR function in EXISTS clause have much more complicated 
execution plan in Hive-2.3 with enabled CBO then in Hive-2.3 with disabled CBO. 
A query below has 8 stages which submit 4 MR jobs in Hive-2.3 with enabled CBO, 
while with disabled CBO it has 4 stages and submits 1 MR job.

 *STEPS TO REPRODUCE:*
{code:java}
CREATE TABLE i1122 (id STRING);
INSERT INTO i1122 VALUES (1),(1001); 

EXPLAIN
SELECT *
FROM i1122 AS t1
WHERE EXISTS (
SELECT 1
FROM i1122 AS t2
WHERE t2.id = substr(t1.id,4)
);{code}

*ACTUAL RESULT:*
Explain plan in Hive-2.3 with disabled CBO(The same execution plan was in 
Hive-1.2 with enabled CBO):
{code:java}
hive> SET hive.cbo.enable=false;
hive> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
> SELECT 1
> FROM i1122 AS t2
> WHERE t2.id = substr(t1.id,4)
> );
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
...
{code}

Explain plan in Hive-2.3 with enabled CBO:

{code:java}
hive> SET hive.cbo.enable=true;
hive> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
> SELECT 1
> FROM i1122 AS t2
> WHERE t2.id = substr(t1.id,4)
> );
OK
STAGE DEPENDENCIES:
  Stage-3 is a root stage
  Stage-8 depends on stages: Stage-3
  Stage-6 depends on stages: Stage-8
  Stage-5 depends on stages: Stage-6 , consists of Stage-7, Stage-1
  Stage-7 has a backup stage: Stage-1
  Stage-4 depends on stages: Stage-7
  Stage-1
  Stage-0 depends on stages: Stage-4, Stage-1

STAGE PLANS:
...
{code}

Full explain plans you can find in attachment

  was:
Queries with SUBSTR function in EXISTS clause have much more complicated 
execution plan in Hive-2.3 then it was in Hive-1.2. A query below has 8 stages 
which submit 4 MR jobs in Hive-2.3, while in Hive-1.2 it has 4 stages and 
submits 1 MR job. Without SUBSTR function or with disabled CBO in Hive-2.3 
execution plan is the same as in Hive-1.2 with enabled CBO.

 *STEPS TO REPRODUCE:*
{code:java}
CREATE TABLE i1122 (id STRING);
INSERT INTO i1122 VALUES (1),(1001); 

EXPLAIN
SELECT *
FROM i1122 AS t1
WHERE EXISTS (
SELECT 1
FROM i1122 AS t2
WHERE t2.id = substr(t1.id,4)
);{code}


> Optimizer complicates execution plan of queries with SUBSTR function in 
> EXISTS clause
> -
>
> Key: HIVE-21950
> URL: https://issues.apache.org/jira/browse/HIVE-21950
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.1.1
>Reporter: Murshid Chalaev
>Priority: Minor
>
> Queries with SUBSTR function in EXISTS clause have much more complicated 
> execution plan in Hive-2.3 with enabled CBO then in Hive-2.3 with disabled 
> CBO. A query below has 8 stages which submit 4 MR jobs in Hive-2.3 with 
> enabled CBO, while with disabled CBO it has 4 stages and submits 1 MR job.
>  *STEPS TO REPRODUCE:*
> {code:java}
> CREATE TABLE i1122 (id STRING);
> INSERT INTO i1122 VALUES (1),(1001); 
> EXPLAIN
> SELECT *
> FROM i1122 AS t1
> WHERE EXISTS (
> SELECT 1
> FROM i1122 AS t2
> WHERE t2.id = substr(t1.id,4)
> );{code}
> *ACTUAL RESULT:*
> Explain plan in Hive-2.3 with disabled CBO(The same execution plan was in 
> Hive-1.2 with enabled CBO):
> {code:java}
> hive> SET hive.cbo.enable=false;
> hive> EXPLAIN
> > SELECT *
> > FROM i1122 AS t1
> > WHERE EXISTS (
> > SELECT 1
> > FROM i1122 AS t2
> > WHERE t2.id = substr(t1.id,4)
> > );
> OK
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> STAGE PLANS:
> ...
> {code}
> Explain plan in Hive-2.3 with enabled CBO:
> {code:java}
> hive> SET hive.cbo.enable=true;
> hive> EXPLAIN
> > SELECT *
> > FROM i1122 AS t1
> > WHERE EXISTS (
> > SELECT 1
> > FROM i1122 AS t2
> > WHERE t2.id = substr(t1.id,4)
> > );
> OK
> STAGE DEPENDENCIES:
>   Stage-3 is a root stage
>   Stage-8 depends on stages: Stage-3
>   Stage-6 depends on stages: Stage-8
>   Stage-5 depends on stages: Stage-6 , consists of Stage-7, Stage-1
>   Stage-7 has a backup stage: Stage-1
>   Stage-4 depends on stages: Stage-7
>   Stage-1
>   Stage-0 depends on stages: Stage-4, Stage-1
> STAGE PLANS:
> ...
> {code}
> Full explain plans you can find in attachment



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)