[ https://issues.apache.org/jira/browse/SPARK-23405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375985#comment-16375985 ]
KaiXinXIaoLei commented on SPARK-23405: --------------------------------------- i run `select ls.cs_order_number from ls left semi join catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, the Optimized Logical Plan is : == Optimized Logical Plan == Join LeftSemi, (cs_order_number#1 = cs_order_number#22) :- Project [cs_order_number#1] : +- Filter isnotnull(cs_order_number#1) : +- MetastoreRelation 100t, ls +- Project [cs_order_number#22] +-Relation[cs_sold_date_sk#5,cs_sold_time_sk#6,cs_ship_date_sk#7,cs_bill_customer_sk#8,cs_bill_cdemo_sk#9,cs_bill_hdemo_sk#10,cs_bill_addr_sk#11,cs_ship_customer_sk#12,cs_ship_cdemo_sk#13,cs_ship_hdemo_sk#14,cs_ship_addr_sk#15,cs_call_center_sk#16,cs_catalog_page_sk#17,cs_ship_mod I think the Optimized Logical Plan should be: == Optimized Logical Plan == Join LeftSemi, (cs_order_number#1 = cs_order_number#22) :- Project [cs_order_number#1] : +- Filter isnotnull(cs_order_number#1) : +- MetastoreRelation 100t, ls +- Project [cs_order_number#22] {color:#FF0000}+- Filter isnotnull(cs_order_number#22){color} +- Relation[cs_sold_date_sk#5,cs_sold_time_sk#6,cs_ship_date_sk#7,cs_bill_customer_sk#8,cs_bill_cdemo_sk#9,cs_bill_hdemo_sk#10,cs_bill_addr_sk#11,cs_ship_customer_sk#12,cs_ship_cdemo_sk#13,cs_ship_hdemo_sk#14,cs_ship_addr_sk#15,cs_call_ > The task will hang up when a small table left semi join a big table > ------------------------------------------------------------------- > > Key: SPARK-23405 > URL: https://issues.apache.org/jira/browse/SPARK-23405 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.1 > Reporter: KaiXinXIaoLei > Priority: Major > Attachments: SQL.png, taskhang up.png > > > # I run a sql: `select ls.cs_order_number from ls left semi join > catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table > is a small table ,and the number is one. The `catalog_sales` table is a big > table, and the number is 10 billion. The task will be hang up: > !taskhang up.png! > And the sql page is : > !SQL.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org