[ 
https://issues.apache.org/jira/browse/SPARK-23405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375985#comment-16375985
 ] 

KaiXinXIaoLei commented on SPARK-23405:
---------------------------------------

i run `select ls.cs_order_number from ls left semi join catalog_sales cs on 
ls.cs_order_number = cs.cs_order_number`, the  Optimized Logical Plan is :

== Optimized Logical Plan ==
Join LeftSemi, (cs_order_number#1 = cs_order_number#22)
:- Project [cs_order_number#1]
: +- Filter isnotnull(cs_order_number#1)
: +- MetastoreRelation 100t, ls
+- Project [cs_order_number#22]
 
+-Relation[cs_sold_date_sk#5,cs_sold_time_sk#6,cs_ship_date_sk#7,cs_bill_customer_sk#8,cs_bill_cdemo_sk#9,cs_bill_hdemo_sk#10,cs_bill_addr_sk#11,cs_ship_customer_sk#12,cs_ship_cdemo_sk#13,cs_ship_hdemo_sk#14,cs_ship_addr_sk#15,cs_call_center_sk#16,cs_catalog_page_sk#17,cs_ship_mod

 

 

I think the Optimized Logical Plan should be:

== Optimized Logical Plan ==
Join LeftSemi, (cs_order_number#1 = cs_order_number#22)
:- Project [cs_order_number#1]
: +- Filter isnotnull(cs_order_number#1)
: +- MetastoreRelation 100t, ls
+- Project [cs_order_number#22]
 {color:#FF0000}+- Filter isnotnull(cs_order_number#22){color}
 +- 
Relation[cs_sold_date_sk#5,cs_sold_time_sk#6,cs_ship_date_sk#7,cs_bill_customer_sk#8,cs_bill_cdemo_sk#9,cs_bill_hdemo_sk#10,cs_bill_addr_sk#11,cs_ship_customer_sk#12,cs_ship_cdemo_sk#13,cs_ship_hdemo_sk#14,cs_ship_addr_sk#15,cs_call_

 

> The task will hang up when a small table left semi join a big table
> -------------------------------------------------------------------
>
>                 Key: SPARK-23405
>                 URL: https://issues.apache.org/jira/browse/SPARK-23405
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.1
>            Reporter: KaiXinXIaoLei
>            Priority: Major
>         Attachments: SQL.png, taskhang up.png
>
>
> # I run a sql: `select ls.cs_order_number from ls left semi join 
> catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table 
> is a small table ,and the number is one. The `catalog_sales` table is a big 
> table,  and the number is 10 billion. The task will be hang up:
> !taskhang up.png!
>  And the sql page is :
> !SQL.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to