[ 
https://issues.apache.org/jira/browse/SPARK-31148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanapol Nearunchorn updated SPARK-31148:
----------------------------------------
    Description: 
Given this query

 
{code:java}
select r.id
from sqoop_wongnai.wongnai__w_ref r
inner join sqoop_wongnai.wongnai__w_photo p on p.root_referrer_id = r.id
inner join tests.restaurants_id_export_photos i on i.restaurant_id = 
r.restaurant_id
where r.restaurant_id = 360
{code}
Spark return nothing except when I switch inner join statement between line 3 
and 4 there are results returned.

I guarantee that we have data that match the join condition above.

Here's result from explain extend of above query:

 
{code:java}
== Parsed Logical Plan ==
'Project ['r.id]
+- 'Filter ('r.restaurant_id = 360)
   +- 'Join Inner, ('i.restaurant_id = 'r.restaurant_id)
      :- 'Join Inner, ('p.root_referrer_id = 'r.id)
      :  :- 'SubqueryAlias `r`
      :  :  +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_ref`
      :  +- 'SubqueryAlias `p`
      :     +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_photo`
      +- 'SubqueryAlias `i`
         +- 'UnresolvedRelation `tests`.`restaurants_id_export_photos`

== Analyzed Logical Plan ==
id: bigint
Project [id#1834834L]
+- Filter (restaurant_id#1834836L = cast(360 as bigint))
   +- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
      :- Join Inner, (root_referrer_id#1834858L = id#1834834L)
      :  :- SubqueryAlias `r`
      :  :  +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_ref`
      :  :     +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`, 
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#1834834L, 
restaurant_id#1834836L,]
      :  +- SubqueryAlias `p`
      :     +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_photo`
      :        +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_photo`, 
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#1834858L, ... 
14 more fields]
      +- SubqueryAlias `i`
         +- SubqueryAlias `tests`.`restaurants_id_export_photos`
            +- HiveTableRelation `tests`.`restaurants_id_export_photos`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]== 
Optimized Logical Plan ==
Project [id#1834834L]
+- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
   :- Project [id#1834834L, restaurant_id#1834836L]
   :  +- Join Inner, (root_referrer_id#1834858L = id#1834834L)
   :     :- Project [id#1834834L, restaurant_id#1834836L]
   :     :  +- Filter ((isnotnull(restaurant_id#1834836L) && 
(restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
   :     :     +- InMemoryRelation [id#1834834L, restaurant_id#1834836L], 
StorageLevel(disk, memory, deserialized, 1 replicas)
   :     :           +- Scan hive sqoop_wongnai.wongnai__w_ref [id#87357L, 
restaurant_id#87359L], HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`, 
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#87357L, restaurant_id#87359L]
   :     +- Project [root_referrer_id#1834858L]
   :        +- Filter (isnotnull(root_referrer_id#1834858L) && 
bloomfilter#1835088 of [id#1834834L] filtering [root_referrer_id#1834858L])
   :           +- InMemoryRelation [root_referrer_id#1834858L], 
StorageLevel(disk, memory, deserialized, 1 replicas)
   :                 +- Scan hive sqoop_wongnai.wongnai__w_photo 
[root_referrer_id#82906L], HiveTableRelation 
`sqoop_wongnai`.`wongnai__w_photo`, 
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
   +- Filter ((isnotnull(restaurant_id#1834876L) && (restaurant_id#1834876L = 
360)) && bloomfilter#1835087 of [restaurant_id#1834836L] filtering 
[restaurant_id#1834876L])
      +- HiveTableRelation `tests`.`restaurants_id_export_photos`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]

== Physical Plan ==
*(7) Project [id#1834834L]
+- *(7) SortMergeJoin [restaurant_id#1834836L], [restaurant_id#1834876L], Inner
   :- *(5) Sort [restaurant_id#1834836L ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(restaurant_id#1834836L, 200)
   :     +- *(4) Project [id#1834834L, restaurant_id#1834836L]
   :        +- *(4) SortMergeJoin [id#1834834L], [root_referrer_id#1834858L], 
Inner
   :           :- *(2) Sort [id#1834834L ASC NULLS FIRST], false, 0
   :           :  +- Exchange hashpartitioning(id#1834834L, 200)
   :           :     +- *(1) Filter ((isnotnull(restaurant_id#1834836L) && 
(restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
   :           :        +- *(1) InMemoryTableScan [id#1834834L, 
restaurant_id#1834836L], [isnotnull(restaurant_id#1834836L), 
(restaurant_id#1834836L = 360), isnotnull(id#1834834L)]
   :           :              +- InMemoryRelation [id#1834834L, 
restaurant_id#1834836L], StorageLevel(disk, memory, deserialized, 1 replicas)
   :           :                    +- Scan hive sqoop_wongnai.wongnai__w_ref 
[id#87357L, restaurant_id#87359L], HiveTableRelation 
`sqoop_wongnai`.`wongnai__w_ref`, org.apache.hadoop.hive.serde2.avro.AvroSerDe, 
[id#87357L, restaurant_id#87359L]
   :           +- *(3) Sort [root_referrer_id#1834858L ASC NULLS FIRST], false, 0
   :              +- Exchange hashpartitioning(root_referrer_id#1834858L, 200)
   :                 +- Filter (isnotnull(root_referrer_id#1834858L) && 
bloomfilter#1835088 of [bf1835088 id#1834834L estimatedNumRows=1457534] 
filtering [root_referrer_id#1834858L])
   :                    :  +- GenerateBloomFilter
   :                    :     +- ReusedExchange [id#1834834L, 
restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
   :                    +- InMemoryTableScan [root_referrer_id#1834858L], 
[isnotnull(root_referrer_id#1834858L), bloomfilter#1835088 of [bf1835088 
id#1834834L estimatedNumRows=1457534] filtering [root_referrer_id#1834858L]]
   :                          :- InMemoryRelation [root_referrer_id#1834858L], 
StorageLevel(disk, memory, deserialized, 1 replicas)
   :                          :     +- Scan hive sqoop_wongnai.wongnai__w_photo 
[root_referrer_id#82906L], HiveTableRelation 
`sqoop_wongnai`.`wongnai__w_photo`, 
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
   :                          +- GenerateBloomFilter
   :                             +- ReusedExchange [id#1834834L, 
restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
   +- *(6) Sort [restaurant_id#1834876L ASC NULLS FIRST], false, 0
      +- Exchange hashpartitioning(restaurant_id#1834876L, 200)
         +- Filter ((isnotnull(restaurant_id#1834876L) && 
(restaurant_id#1834876L = 360)) && bloomfilter#1835087 of [bf1835088 
id#1834834L estimatedNumRows=1457534] filtering [restaurant_id#1834876L])
            :  +- GenerateBloomFilter
            :     +- ReusedExchange [id#1834834L, restaurant_id#1834836L], 
Exchange hashpartitioning(id#1834834L, 200)
            +- Scan hive tests.restaurants_id_export_photos 
[restaurant_id#1834876L], HiveTableRelation 
`tests`.`restaurants_id_export_photos`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]
{code}
 

 

 

I noticed that in physical plan

 
{code:java}
bloomfilter#1835087 of [bf1835088 id#1834834L estimatedNumRows=1457534] 
filtering [restaurant_id#1834876L]
{code}
it use bf1835088 id#1834834L for filtering restaurant_id#1834876L and which is 
different from join condition.

I think that the correct Physical query plan should use bf1835087 
restaurant_id#1834836L for filtering restaurant_id#1834876L.

 

You can see full of query plan of both incorrect and correct in this link:

[https://gist.github.com/tanapoln/67a335858dede5eddd9d80659de60d72]

 

Please help.

 

  was:
Given this query

 
{code:java}
select r.id
from sqoop_wongnai.wongnai__w_ref r
inner join sqoop_wongnai.wongnai__w_photo p on p.root_referrer_id = r.id
inner join tests.restaurants_id_export_photos i on i.restaurant_id = 
r.restaurant_id
where r.restaurant_id = 360
{code}
Spark return nothing except when I switch inner join statement between line 3 
and 4 there are results returned.

I guarantee that we have data that match the join condition above.

Here's result from explain extend of above query:

 
{code:java}
== Parsed Logical Plan ==
'Project ['r.id]
+- 'Filter ('r.restaurant_id = 360)
   +- 'Join Inner, ('i.restaurant_id = 'r.restaurant_id)
      :- 'Join Inner, ('p.root_referrer_id = 'r.id)
      :  :- 'SubqueryAlias `r`
      :  :  +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_ref`
      :  +- 'SubqueryAlias `p`
      :     +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_photo`
      +- 'SubqueryAlias `i`
         +- 'UnresolvedRelation `tests`.`restaurants_id_export_photos`== 
Analyzed Logical Plan ==
id: bigint
Project [id#1834834L]
+- Filter (restaurant_id#1834836L = cast(360 as bigint))
   +- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
      :- Join Inner, (root_referrer_id#1834858L = id#1834834L)
      :  :- SubqueryAlias `r`
      :  :  +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_ref`
      :  :     +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`, 
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#1834834L, 
restaurant_id#1834836L,]
      :  +- SubqueryAlias `p`
      :     +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_photo`
      :        +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_photo`, 
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#1834858L, ... 
14 more fields]
      +- SubqueryAlias `i`
         +- SubqueryAlias `tests`.`restaurants_id_export_photos`
            +- HiveTableRelation `tests`.`restaurants_id_export_photos`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]== 
Optimized Logical Plan ==
Project [id#1834834L]
+- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
   :- Project [id#1834834L, restaurant_id#1834836L]
   :  +- Join Inner, (root_referrer_id#1834858L = id#1834834L)
   :     :- Project [id#1834834L, restaurant_id#1834836L]
   :     :  +- Filter ((isnotnull(restaurant_id#1834836L) && 
(restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
   :     :     +- InMemoryRelation [id#1834834L, restaurant_id#1834836L], 
StorageLevel(disk, memory, deserialized, 1 replicas)
   :     :           +- Scan hive sqoop_wongnai.wongnai__w_ref [id#87357L, 
restaurant_id#87359L], HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`, 
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#87357L, restaurant_id#87359L]
   :     +- Project [root_referrer_id#1834858L]
   :        +- Filter (isnotnull(root_referrer_id#1834858L) && 
bloomfilter#1835088 of [id#1834834L] filtering [root_referrer_id#1834858L])
   :           +- InMemoryRelation [root_referrer_id#1834858L], 
StorageLevel(disk, memory, deserialized, 1 replicas)
   :                 +- Scan hive sqoop_wongnai.wongnai__w_photo 
[root_referrer_id#82906L], HiveTableRelation 
`sqoop_wongnai`.`wongnai__w_photo`, 
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
   +- Filter ((isnotnull(restaurant_id#1834876L) && (restaurant_id#1834876L = 
360)) && bloomfilter#1835087 of [restaurant_id#1834836L] filtering 
[restaurant_id#1834876L])
      +- HiveTableRelation `tests`.`restaurants_id_export_photos`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]== 
Physical Plan ==
*(7) Project [id#1834834L]
+- *(7) SortMergeJoin [restaurant_id#1834836L], [restaurant_id#1834876L], Inner
   :- *(5) Sort [restaurant_id#1834836L ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(restaurant_id#1834836L, 200)
   :     +- *(4) Project [id#1834834L, restaurant_id#1834836L]
   :        +- *(4) SortMergeJoin [id#1834834L], [root_referrer_id#1834858L], 
Inner
   :           :- *(2) Sort [id#1834834L ASC NULLS FIRST], false, 0
   :           :  +- Exchange hashpartitioning(id#1834834L, 200)
   :           :     +- *(1) Filter ((isnotnull(restaurant_id#1834836L) && 
(restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
   :           :        +- *(1) InMemoryTableScan [id#1834834L, 
restaurant_id#1834836L], [isnotnull(restaurant_id#1834836L), 
(restaurant_id#1834836L = 360), isnotnull(id#1834834L)]
   :           :              +- InMemoryRelation [id#1834834L, 
restaurant_id#1834836L], StorageLevel(disk, memory, deserialized, 1 replicas)
   :           :                    +- Scan hive sqoop_wongnai.wongnai__w_ref 
[id#87357L, restaurant_id#87359L], HiveTableRelation 
`sqoop_wongnai`.`wongnai__w_ref`, org.apache.hadoop.hive.serde2.avro.AvroSerDe, 
[id#87357L, restaurant_id#87359L]
   :           +- *(3) Sort [root_referrer_id#1834858L ASC NULLS FIRST], false, 0
   :              +- Exchange hashpartitioning(root_referrer_id#1834858L, 200)
   :                 +- Filter (isnotnull(root_referrer_id#1834858L) && 
bloomfilter#1835088 of [bf1835088 id#1834834L estimatedNumRows=1457534] 
filtering [root_referrer_id#1834858L])
   :                    :  +- GenerateBloomFilter
   :                    :     +- ReusedExchange [id#1834834L, 
restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
   :                    +- InMemoryTableScan [root_referrer_id#1834858L], 
[isnotnull(root_referrer_id#1834858L), bloomfilter#1835088 of [bf1835088 
id#1834834L estimatedNumRows=1457534] filtering [root_referrer_id#1834858L]]
   :                          :- InMemoryRelation [root_referrer_id#1834858L], 
StorageLevel(disk, memory, deserialized, 1 replicas)
   :                          :     +- Scan hive sqoop_wongnai.wongnai__w_photo 
[root_referrer_id#82906L], HiveTableRelation 
`sqoop_wongnai`.`wongnai__w_photo`, 
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
   :                          +- GenerateBloomFilter
   :                             +- ReusedExchange [id#1834834L, 
restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
   +- *(6) Sort [restaurant_id#1834876L ASC NULLS FIRST], false, 0
      +- Exchange hashpartitioning(restaurant_id#1834876L, 200)
         +- Filter ((isnotnull(restaurant_id#1834876L) && 
(restaurant_id#1834876L = 360)) && bloomfilter#1835087 of [bf1835088 
id#1834834L estimatedNumRows=1457534] filtering [restaurant_id#1834876L])
            :  +- GenerateBloomFilter
            :     +- ReusedExchange [id#1834834L, restaurant_id#1834836L], 
Exchange hashpartitioning(id#1834834L, 200)
            +- Scan hive tests.restaurants_id_export_photos 
[restaurant_id#1834876L], HiveTableRelation 
`tests`.`restaurants_id_export_photos`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]
{code}
 

 

 

I noticed that in physical plan

 
{code:java}
bloomfilter#1835087 of [bf1835088 id#1834834L estimatedNumRows=1457534] 
filtering [restaurant_id#1834876L]
{code}
it use bf1835088 id#1834834L for filtering restaurant_id#1834876L and which is 
different from join condition.

I think that the correct Physical query plan should use bf1835087 
restaurant_id#1834836L for filtering restaurant_id#1834876L.

 

You can see full of query plan of both incorrect and correct in this link:

[https://gist.github.com/tanapoln/67a335858dede5eddd9d80659de60d72]

 

Please help.

 


> Physical Plan and Optimized Logical Plan are mismatch
> -----------------------------------------------------
>
>                 Key: SPARK-31148
>                 URL: https://issues.apache.org/jira/browse/SPARK-31148
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.4
>            Reporter: Tanapol Nearunchorn
>            Priority: Major
>
> Given this query
>  
> {code:java}
> select r.id
> from sqoop_wongnai.wongnai__w_ref r
> inner join sqoop_wongnai.wongnai__w_photo p on p.root_referrer_id = r.id
> inner join tests.restaurants_id_export_photos i on i.restaurant_id = 
> r.restaurant_id
> where r.restaurant_id = 360
> {code}
> Spark return nothing except when I switch inner join statement between line 3 
> and 4 there are results returned.
> I guarantee that we have data that match the join condition above.
> Here's result from explain extend of above query:
>  
> {code:java}
> == Parsed Logical Plan ==
> 'Project ['r.id]
> +- 'Filter ('r.restaurant_id = 360)
>    +- 'Join Inner, ('i.restaurant_id = 'r.restaurant_id)
>       :- 'Join Inner, ('p.root_referrer_id = 'r.id)
>       :  :- 'SubqueryAlias `r`
>       :  :  +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_ref`
>       :  +- 'SubqueryAlias `p`
>       :     +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_photo`
>       +- 'SubqueryAlias `i`
>          +- 'UnresolvedRelation `tests`.`restaurants_id_export_photos`
> == Analyzed Logical Plan ==
> id: bigint
> Project [id#1834834L]
> +- Filter (restaurant_id#1834836L = cast(360 as bigint))
>    +- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
>       :- Join Inner, (root_referrer_id#1834858L = id#1834834L)
>       :  :- SubqueryAlias `r`
>       :  :  +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_ref`
>       :  :     +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`, 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#1834834L, 
> restaurant_id#1834836L,]
>       :  +- SubqueryAlias `p`
>       :     +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_photo`
>       :        +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_photo`, 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#1834858L, ... 
> 14 more fields]
>       +- SubqueryAlias `i`
>          +- SubqueryAlias `tests`.`restaurants_id_export_photos`
>             +- HiveTableRelation `tests`.`restaurants_id_export_photos`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [restaurant_id#1834876L]== Optimized Logical Plan ==
> Project [id#1834834L]
> +- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
>    :- Project [id#1834834L, restaurant_id#1834836L]
>    :  +- Join Inner, (root_referrer_id#1834858L = id#1834834L)
>    :     :- Project [id#1834834L, restaurant_id#1834836L]
>    :     :  +- Filter ((isnotnull(restaurant_id#1834836L) && 
> (restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
>    :     :     +- InMemoryRelation [id#1834834L, restaurant_id#1834836L], 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>    :     :           +- Scan hive sqoop_wongnai.wongnai__w_ref [id#87357L, 
> restaurant_id#87359L], HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`, 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#87357L, 
> restaurant_id#87359L]
>    :     +- Project [root_referrer_id#1834858L]
>    :        +- Filter (isnotnull(root_referrer_id#1834858L) && 
> bloomfilter#1835088 of [id#1834834L] filtering [root_referrer_id#1834858L])
>    :           +- InMemoryRelation [root_referrer_id#1834858L], 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>    :                 +- Scan hive sqoop_wongnai.wongnai__w_photo 
> [root_referrer_id#82906L], HiveTableRelation 
> `sqoop_wongnai`.`wongnai__w_photo`, 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
>    +- Filter ((isnotnull(restaurant_id#1834876L) && (restaurant_id#1834876L = 
> 360)) && bloomfilter#1835087 of [restaurant_id#1834836L] filtering 
> [restaurant_id#1834876L])
>       +- HiveTableRelation `tests`.`restaurants_id_export_photos`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]
> == Physical Plan ==
> *(7) Project [id#1834834L]
> +- *(7) SortMergeJoin [restaurant_id#1834836L], [restaurant_id#1834876L], 
> Inner
>    :- *(5) Sort [restaurant_id#1834836L ASC NULLS FIRST], false, 0
>    :  +- Exchange hashpartitioning(restaurant_id#1834836L, 200)
>    :     +- *(4) Project [id#1834834L, restaurant_id#1834836L]
>    :        +- *(4) SortMergeJoin [id#1834834L], [root_referrer_id#1834858L], 
> Inner
>    :           :- *(2) Sort [id#1834834L ASC NULLS FIRST], false, 0
>    :           :  +- Exchange hashpartitioning(id#1834834L, 200)
>    :           :     +- *(1) Filter ((isnotnull(restaurant_id#1834836L) && 
> (restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
>    :           :        +- *(1) InMemoryTableScan [id#1834834L, 
> restaurant_id#1834836L], [isnotnull(restaurant_id#1834836L), 
> (restaurant_id#1834836L = 360), isnotnull(id#1834834L)]
>    :           :              +- InMemoryRelation [id#1834834L, 
> restaurant_id#1834836L], StorageLevel(disk, memory, deserialized, 1 replicas)
>    :           :                    +- Scan hive sqoop_wongnai.wongnai__w_ref 
> [id#87357L, restaurant_id#87359L], HiveTableRelation 
> `sqoop_wongnai`.`wongnai__w_ref`, 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#87357L, 
> restaurant_id#87359L]
>    :           +- *(3) Sort [root_referrer_id#1834858L ASC NULLS FIRST], 
> false, 0
>    :              +- Exchange hashpartitioning(root_referrer_id#1834858L, 200)
>    :                 +- Filter (isnotnull(root_referrer_id#1834858L) && 
> bloomfilter#1835088 of [bf1835088 id#1834834L estimatedNumRows=1457534] 
> filtering [root_referrer_id#1834858L])
>    :                    :  +- GenerateBloomFilter
>    :                    :     +- ReusedExchange [id#1834834L, 
> restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
>    :                    +- InMemoryTableScan [root_referrer_id#1834858L], 
> [isnotnull(root_referrer_id#1834858L), bloomfilter#1835088 of [bf1835088 
> id#1834834L estimatedNumRows=1457534] filtering [root_referrer_id#1834858L]]
>    :                          :- InMemoryRelation 
> [root_referrer_id#1834858L], StorageLevel(disk, memory, deserialized, 1 
> replicas)
>    :                          :     +- Scan hive 
> sqoop_wongnai.wongnai__w_photo [root_referrer_id#82906L], HiveTableRelation 
> `sqoop_wongnai`.`wongnai__w_photo`, 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
>    :                          +- GenerateBloomFilter
>    :                             +- ReusedExchange [id#1834834L, 
> restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
>    +- *(6) Sort [restaurant_id#1834876L ASC NULLS FIRST], false, 0
>       +- Exchange hashpartitioning(restaurant_id#1834876L, 200)
>          +- Filter ((isnotnull(restaurant_id#1834876L) && 
> (restaurant_id#1834876L = 360)) && bloomfilter#1835087 of [bf1835088 
> id#1834834L estimatedNumRows=1457534] filtering [restaurant_id#1834876L])
>             :  +- GenerateBloomFilter
>             :     +- ReusedExchange [id#1834834L, restaurant_id#1834836L], 
> Exchange hashpartitioning(id#1834834L, 200)
>             +- Scan hive tests.restaurants_id_export_photos 
> [restaurant_id#1834876L], HiveTableRelation 
> `tests`.`restaurants_id_export_photos`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]
> {code}
>  
>  
>  
> I noticed that in physical plan
>  
> {code:java}
> bloomfilter#1835087 of [bf1835088 id#1834834L estimatedNumRows=1457534] 
> filtering [restaurant_id#1834876L]
> {code}
> it use bf1835088 id#1834834L for filtering restaurant_id#1834876L and which 
> is different from join condition.
> I think that the correct Physical query plan should use bf1835087 
> restaurant_id#1834836L for filtering restaurant_id#1834876L.
>  
> You can see full of query plan of both incorrect and correct in this link:
> [https://gist.github.com/tanapoln/67a335858dede5eddd9d80659de60d72]
>  
> Please help.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to