[jira] [Updated] (SPARK-23542) The `where exists' action in optimized logical plan should be optimized

2018-02-28 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-23542:
--
Description: 
The optimized logical plan of query '*select * from tt1 where exists (select *  
from tt2  where tt1.i = tt2.i)*' is :
{code:java}
== Optimized Logical Plan ==
Join LeftSemi, (i#14 = i#16)
:- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15]
+- Project [i#16]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17]{code}
 

But the query of `*select * from tt1 left semi join tt2 on tt2.i = tt1.i*` is :
{noformat}
== Optimized Logical Plan ==
Join LeftSemi, (i#22 = i#20)
:- Filter isnotnull(i#20)
: +- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
+- Project [i#22]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]{noformat}
 

 So i think the  optimized logical plan of '*select * from tt1 where exists 
(select *  from tt2  where tt1.i = tt2.i)*;` should be further optimization.
{code:java}
== Optimized Logical Plan ==
Join LeftSemi, (i#14 = i#16)
:- Filter isnotnull(i#20)
: +- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15]
+- Project [i#16]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17]{code}
 

  was:
The optimized logical plan of query 'select * from tt1 where exists (select *  
from tt2  where tt1.i = tt2.i)' is :

 
{code:java}
== Optimized Logical Plan ==
Join LeftSemi, (i#14 = i#16)
:- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15]
+- Project [i#16]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17]{code}
But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is :
{noformat}
== Optimized Logical Plan ==
Join LeftSemi, (i#22 = i#20)
:- Filter isnotnull(i#20)
: +- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
+- Project [i#22]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]{noformat}
 

 So i think the  optimized logical plan of 'select * from tt1 where exists 
(select *  from tt2  where tt1.i = tt2.i);` should be further optimization.

 
{code:java}
== Optimized Logical Plan ==
Join LeftSemi, (i#14 = i#16)
:- Filter isnotnull(i#20)
: +- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15]
+- Project [i#16]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17]{code}
 


> The `where exists' action in optimized logical plan should be optimized 
> 
>
> Key: SPARK-23542
> URL: https://issues.apache.org/jira/browse/SPARK-23542
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: KaiXinXIaoLei
>Priority: Major
>
> The optimized logical plan of query '*select * from tt1 where exists (select 
> *  from tt2  where tt1.i = tt2.i)*' is :
> {code:java}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#14 = i#16)
> :- HiveTableRelation `default`.`tt1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15]
> +- Project [i#16]
> +- HiveTableRelation `default`.`tt2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17]{code}
>  
> But the query of `*select * from tt1 left semi join tt2 on tt2.i = tt1.i*` is 
> :
> {noformat}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#22 = i#20)
> :- Filter isnotnull(i#20)
> : +- HiveTableRelation `default`.`tt1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
> +- Project [i#22]
> +- HiveTableRelation `default`.`tt2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]{noformat}
>  
>  So i think the  optimized logical plan of '*select * from tt1 where exists 
> (select *  from tt2  where tt1.i = tt2.i)*;` should be further optimization.
> {code:java}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#14 = i#16)
> :- Filter isnotnull(i#20)
> : +- HiveTableRelation `default`.`tt1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15]
> +- Project [i#16]
> +- HiveTableRelation `default`.`tt2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23542) The `where exists' action in optimized logical plan should be optimized

2018-02-28 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-23542:
--
Description: 
The optimized logical plan of query 'select * from tt1 where exists (select *  
from tt2  where tt1.i = tt2.i)' is :

 
{code:java}
== Optimized Logical Plan ==
Join LeftSemi, (i#14 = i#16)
:- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15]
+- Project [i#16]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17]{code}
But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is :
{noformat}
== Optimized Logical Plan ==
Join LeftSemi, (i#22 = i#20)
:- Filter isnotnull(i#20)
: +- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
+- Project [i#22]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]{noformat}
 

 So i think the  optimized logical plan of 'select * from tt1 where exists 
(select *  from tt2  where tt1.i = tt2.i);` should be further optimization.

 
{code:java}
== Optimized Logical Plan ==
Join LeftSemi, (i#14 = i#16)
:- Filter isnotnull(i#20)
: +- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15]
+- Project [i#16]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17]{code}
 

  was:
The optimized logical plan of query 'select * from tt1 where exists (select *  
from tt2  where tt1.i = tt2.i)' is :

 
{noformat}
== Optimized Logical Plan ==
Join LeftSemi, (i#22 = i#20)
:- Filter isnotnull(i#20)
: +- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
+- Project [i#22]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]{noformat}
 

>== Optimized Logical Plan ==
 >Join LeftSemi, (i#143 = i#145)
 >:- MetastoreRelation default, tt1
 >+- MetastoreRelation default, tt2

But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is :
{code:java}
== Optimized Logical Plan ==
Join LeftSemi, (i#22 = i#20)
:- Filter isnotnull(i#20)
: +- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
+- Project [i#22]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]
{code}
 

 So i think the  optimized logical plan of 'select * from tt1 where exists 
(select *  from tt2  where tt1.i = tt2.i);` should be further optimization.

 

 


> The `where exists' action in optimized logical plan should be optimized 
> 
>
> Key: SPARK-23542
> URL: https://issues.apache.org/jira/browse/SPARK-23542
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: KaiXinXIaoLei
>Priority: Major
>
> The optimized logical plan of query 'select * from tt1 where exists (select * 
>  from tt2  where tt1.i = tt2.i)' is :
>  
> {code:java}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#14 = i#16)
> :- HiveTableRelation `default`.`tt1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15]
> +- Project [i#16]
> +- HiveTableRelation `default`.`tt2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17]{code}
> But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is :
> {noformat}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#22 = i#20)
> :- Filter isnotnull(i#20)
> : +- HiveTableRelation `default`.`tt1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
> +- Project [i#22]
> +- HiveTableRelation `default`.`tt2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]{noformat}
>  
>  So i think the  optimized logical plan of 'select * from tt1 where exists 
> (select *  from tt2  where tt1.i = tt2.i);` should be further optimization.
>  
> {code:java}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#14 = i#16)
> :- Filter isnotnull(i#20)
> : +- HiveTableRelation `default`.`tt1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15]
> +- Project [i#16]
> +- HiveTableRelation `default`.`tt2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23542) The `where exists' action in optimized logical plan should be optimized

2018-02-28 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-23542:
--
Description: 
The optimized logical plan of query 'select * from tt1 where exists (select *  
from tt2  where tt1.i = tt2.i)' is :

 
{noformat}
== Optimized Logical Plan ==
Join LeftSemi, (i#22 = i#20)
:- Filter isnotnull(i#20)
: +- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
+- Project [i#22]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]{noformat}
 

>== Optimized Logical Plan ==
 >Join LeftSemi, (i#143 = i#145)
 >:- MetastoreRelation default, tt1
 >+- MetastoreRelation default, tt2

But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is :
{code:java}
== Optimized Logical Plan ==
Join LeftSemi, (i#22 = i#20)
:- Filter isnotnull(i#20)
: +- HiveTableRelation `default`.`tt1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
+- Project [i#22]
+- HiveTableRelation `default`.`tt2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]
{code}
 

 So i think the  optimized logical plan of 'select * from tt1 where exists 
(select *  from tt2  where tt1.i = tt2.i);` should be further optimization.

 

 

  was:
The optimized logical plan of query 'select * from tt1 where exists (select *  
from tt2  where tt1.i = tt2.i);` is :

>== Optimized Logical Plan ==
 >Join LeftSemi, (i#143 = i#145)
 >:- MetastoreRelation default, tt1
 >+- MetastoreRelation default, tt2

But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is :

>== Optimized Logical Plan ==
 Join LeftSemi, (i#152 = i#150)
 :- Filter isnotnull(i#150)
 : +- MetastoreRelation default, tt1
 +- Project [i#152|#152]
 +- MetastoreRelation default, tt2

 

 So i think the  optimized logical plan of 'select * from tt1 where exists 
(select *  from tt2  where tt1.i = tt2.i);` should be further optimization.

 

 


> The `where exists' action in optimized logical plan should be optimized 
> 
>
> Key: SPARK-23542
> URL: https://issues.apache.org/jira/browse/SPARK-23542
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: KaiXinXIaoLei
>Priority: Major
>
> The optimized logical plan of query 'select * from tt1 where exists (select * 
>  from tt2  where tt1.i = tt2.i)' is :
>  
> {noformat}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#22 = i#20)
> :- Filter isnotnull(i#20)
> : +- HiveTableRelation `default`.`tt1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
> +- Project [i#22]
> +- HiveTableRelation `default`.`tt2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]{noformat}
>  
> >== Optimized Logical Plan ==
>  >Join LeftSemi, (i#143 = i#145)
>  >:- MetastoreRelation default, tt1
>  >+- MetastoreRelation default, tt2
> But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is :
> {code:java}
> == Optimized Logical Plan ==
> Join LeftSemi, (i#22 = i#20)
> :- Filter isnotnull(i#20)
> : +- HiveTableRelation `default`.`tt1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21]
> +- Project [i#22]
> +- HiveTableRelation `default`.`tt2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23]
> {code}
>  
>  So i think the  optimized logical plan of 'select * from tt1 where exists 
> (select *  from tt2  where tt1.i = tt2.i);` should be further optimization.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23542) The `where exists' action in optimized logical plan should be optimized

2018-02-28 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-23542:
--
Description: 
The optimized logical plan of query 'select * from tt1 where exists (select *  
from tt2  where tt1.i = tt2.i);` is :

>== Optimized Logical Plan ==
 >Join LeftSemi, (i#143 = i#145)
 >:- MetastoreRelation default, tt1
 >+- MetastoreRelation default, tt2

But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is :

>== Optimized Logical Plan ==
 Join LeftSemi, (i#152 = i#150)
 :- Filter isnotnull(i#150)
 : +- MetastoreRelation default, tt1
 +- Project [i#152|#152]
 +- MetastoreRelation default, tt2

 

 So i think the  optimized logical plan of 'select * from tt1 where exists 
(select *  from tt2  where tt1.i = tt2.i);` should be further optimization.

 

 

  was:
The optimized logical plan of query 'select * from tt1 where exists (select *  
from tt2  where tt1.i = tt2.i);` is :

>== Optimized Logical Plan ==
>Join LeftSemi, (i#143 = i#145)
>:- MetastoreRelation default, tt1
>+- MetastoreRelation default, tt2

But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is :

>== Optimized Logical Plan ==
 Join LeftSemi, (i#152 = i#150)
 :- Filter isnotnull(i#150)
 : +- MetastoreRelation default, tt1
 +- Project [i#152|#152]
 +- MetastoreRelation default, tt2

 

 So i think the  optimized logical plan of 'select * from tt1 where exists 
(select *  from tt2  where tt1.i = tt2.i);` should be further optimization.

 

== Optimized Logical Plan ==
 Join LeftSemi, (i#143 = i#145)
 :- MetastoreRelation default, tt1
 +- MetastoreRelation default, tt2


> The `where exists' action in optimized logical plan should be optimized 
> 
>
> Key: SPARK-23542
> URL: https://issues.apache.org/jira/browse/SPARK-23542
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: KaiXinXIaoLei
>Priority: Major
>
> The optimized logical plan of query 'select * from tt1 where exists (select * 
>  from tt2  where tt1.i = tt2.i);` is :
> >== Optimized Logical Plan ==
>  >Join LeftSemi, (i#143 = i#145)
>  >:- MetastoreRelation default, tt1
>  >+- MetastoreRelation default, tt2
> But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is :
> >== Optimized Logical Plan ==
>  Join LeftSemi, (i#152 = i#150)
>  :- Filter isnotnull(i#150)
>  : +- MetastoreRelation default, tt1
>  +- Project [i#152|#152]
>  +- MetastoreRelation default, tt2
>  
>  So i think the  optimized logical plan of 'select * from tt1 where exists 
> (select *  from tt2  where tt1.i = tt2.i);` should be further optimization.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org