[ 
https://issues.apache.org/jira/browse/SPARK-14097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-14097.
----------------------------------
    Resolution: Invalid

Is the issue different plans? It really looks hard to read and at least for me 
the point is not clear.

As it seems there has been no interests for an almost year, and I strongly feel 
that no one is going to resolve this issue unless the reporter keeps 
reproducing this. I am resolving this. Please reopen this if anyone can verify 
this.

> Spark SQL Optimization is not consistent
> ----------------------------------------
>
>                 Key: SPARK-14097
>                 URL: https://issues.apache.org/jira/browse/SPARK-14097
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2, 1.6.1
>            Reporter: Gaurav Tiwari
>            Priority: Minor
>
> I am trying to execute a simple query with join on 3 tables. When I look at 
> the execution plan , it varies with position of table in the "from" clause of 
> the query. Execution plan looks more optimized when the position of table 
> with predicates is specified before any other table. 
> a) Original query : 
> select distinct pge.portfolio_code 
> from table1 pge join table2 p 
> on p.perm_group = pge.anc_port_group 
> join table3 uge 
> on p.user_group=uge.anc_user_group 
> where uge.user_name = 'user' and p.perm_type = 'TEST' 
> b) Optimized query (table with predicates is moved ahead): 
> select distinct pge.portfolio_code from table3 uge, table2 p, table1 pge 
> where uge.user_name = 'user' and p.perm_type = 'TEST'  and p.perm_group = 
> pge.anc_port_group  and p.user_group=uge.anc_user_group
> 1) Execution Plan for Original query (a):
> == Parsed Logical Plan ==
> 'Distinct
>  'Project [unresolvedalias('pge.portfolio_code)]
>   'Filter (('uge.user_name = user) && ('p.perm_type = TEST))
>    'Join Inner, Some(('p.user_group = 'uge.anc_user_group))
>     'Join Inner, Some(('p.perm_group = 'pge.anc_port_group))
>      'UnresolvedRelation [table1], Some(pge)
>      'UnresolvedRelation [table2], Some(p)
>     'UnresolvedRelation [table3], Some(uge)
> == Analyzed Logical Plan ==
> portfolio_code: string
> Distinct
>  Project [portfolio_code#7]
>   Filter ((user_name#12 = user) && (perm_type#9 = TEST))
>    Join Inner, Some((user_group#8 = anc_user_group#11))
>     Join Inner, Some((perm_group#10 = anc_port_group#5))
>      Subquery pge
>       Relation[anc_port_group#5,portfolio_name#6,portfolio_code#7] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table1.parquet]
>      Subquery p
>       Relation[user_group#8,perm_type#9,perm_group#10] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table2.parquet]
>     Subquery uge
>      Relation[anc_user_group#11,user_name#12] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table3.parquet]
> == Optimized Logical Plan ==
> Aggregate [portfolio_code#7], [portfolio_code#7]
>  Project [portfolio_code#7]
>   Join Inner, Some((user_group#8 = anc_user_group#11))
>    Project [portfolio_code#7,user_group#8]
>     Join Inner, Some((perm_group#10 = anc_port_group#5))
>      Project [portfolio_code#7,anc_port_group#5]
>       Relation[anc_port_group#5,portfolio_name#6,portfolio_code#7] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table1.parquet]
>      Project [user_group#8,perm_group#10]
>       Filter (perm_type#9 = TEST)
>        Relation[user_group#8,perm_type#9,perm_group#10] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table2.parquet]
>    Project [anc_user_group#11]
>     Filter (user_name#12 = user)
>      Relation[anc_user_group#11,user_name#12] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table3.parquet]
> == Physical Plan ==
> TungstenAggregate(key=[portfolio_code#7], functions=[], 
> output=[portfolio_code#7])
>  TungstenExchange hashpartitioning(portfolio_code#7)
>   TungstenAggregate(key=[portfolio_code#7], functions=[], 
> output=[portfolio_code#7])
>    TungstenProject [portfolio_code#7]
>     BroadcastHashJoin [user_group#8], [anc_user_group#11], BuildRight
>      TungstenProject [portfolio_code#7,user_group#8]
>       BroadcastHashJoin [anc_port_group#5], [perm_group#10], BuildRight
>        ConvertToUnsafe
>         Scan 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table1.parquet][portfolio_code#7,anc_port_group#5]
>        ConvertToUnsafe
>         Project [user_group#8,perm_group#10]
>          Filter (perm_type#9 = TEST)
>           Scan 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table2.parquet][user_group#8,perm_group#10,perm_type#9]
>      ConvertToUnsafe
>       Project [anc_user_group#11]
>        Filter (user_name#12 = user)
>         Scan 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table3.parquet][anc_user_group#11,user_name#12]
> Code Generation: true
> 2) Execution Plan for  Optimized query (b):
> == Parsed Logical Plan ==
> 'Distinct
>  'Project [unresolvedalias('pge.portfolio_code)]
>   'Filter (((('uge.user_name = user) && ('p.perm_type = TEST)) && 
> ('p.perm_group = 'pge.anc_port_group)) && ('p.user_group = 
> 'uge.anc_user_group))
>    'Join Inner, None
>     'Join Inner, None
>      'UnresolvedRelation [table3], Some(uge)
>      'UnresolvedRelation [table2], Some(p)
>     'UnresolvedRelation [table1], Some(pge)
> == Analyzed Logical Plan ==
> portfolio_code: string
> Distinct
>  Project [portfolio_code#7]
>   Filter ((((user_name#12 = user) && (perm_type#9 = TEST)) && (perm_group#10 
> = anc_port_group#5)) && (user_group#8 = anc_user_group#11))
>    Join Inner, None
>     Join Inner, None
>      Subquery uge
>       Relation[anc_user_group#11,user_name#12] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table3.parquet]
>      Subquery p
>       Relation[user_group#8,perm_type#9,perm_group#10] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table2.parquet]
>     Subquery pge
>      Relation[anc_port_group#5,portfolio_name#6,portfolio_code#7] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table1.parquet]
> == Optimized Logical Plan ==
> Aggregate [portfolio_code#7], [portfolio_code#7]
>  Project [portfolio_code#7]
>   Join Inner, Some((perm_group#10 = anc_port_group#5))
>    Project [perm_group#10]
>     Join Inner, Some((user_group#8 = anc_user_group#11))
>      Project [anc_user_group#11]
>       Filter (user_name#12 = user)
>        Relation[anc_user_group#11,user_name#12] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table3.parquet]
>      Project [perm_group#10,user_group#8]
>       Filter (perm_type#9 = TEST)
>        Relation[user_group#8,perm_type#9,perm_group#10] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table2.parquet]
>    Project [portfolio_code#7,anc_port_group#5]
>     Relation[anc_port_group#5,portfolio_name#6,portfolio_code#7] 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table1.parquet]
> == Physical Plan ==
> TungstenAggregate(key=[portfolio_code#7], functions=[], 
> output=[portfolio_code#7])
>  TungstenExchange hashpartitioning(portfolio_code#7)
>   TungstenAggregate(key=[portfolio_code#7], functions=[], 
> output=[portfolio_code#7])
>    TungstenProject [portfolio_code#7]
>     SortMergeJoin [perm_group#10], [anc_port_group#5]
>      TungstenSort [perm_group#10 ASC], false, 0
>       TungstenExchange hashpartitioning(perm_group#10)
>        TungstenProject [perm_group#10]
>         BroadcastHashJoin [anc_user_group#11], [user_group#8], BuildRight
>          ConvertToUnsafe
>           Project [anc_user_group#11]
>            Filter (user_name#12 = user)
>             Scan 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table3.parquet][anc_user_group#11,user_name#12]
>          ConvertToUnsafe
>           Project [perm_group#10,user_group#8]
>            Filter (perm_type#9 = TEST)
>             Scan 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table2.parquet][perm_group#10,user_group#8,perm_type#9]
>      TungstenSort [anc_port_group#5 ASC], false, 0
>       TungstenExchange hashpartitioning(anc_port_group#5)
>        ConvertToUnsafe
>         Scan 
> ParquetRelation[snackfs://tst:9042/aladdin_data_beta/table1.parquet][portfolio_code#7,anc_port_group#5]
> Code Generation: true



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to