[ https://issues.apache.org/jira/browse/SPARK-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414360#comment-15414360 ]
Davies Liu commented on SPARK-16093: ------------------------------------ Could you `set spark.sql.autoBroadcastJoinThreshold` to verify that the config was correctly set? > Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1 > -------------------------------------------------------------------------- > > Key: SPARK-16093 > URL: https://issues.apache.org/jira/browse/SPARK-16093 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: marymwu > Attachments: Errorlog.txt > > > Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1 > Precondition: > set spark.sql.autoBroadcastJoinThreshold = 1; > Testcase: > "INSERT OVERWRITE TABLE > RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY PARTITION > (p_event_date='2016-06-18') > select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from > (select app_key,app_channel,lps_did from > RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a > join > (select app_key,lps_did,device_model, count(1) as visits from > RPS__H_REPORT_MORE_DIMENSION_SMALL where p_event_date = '2016-06-18' > and ( log_type=1 or log_type=2) > group by app_key,lps_did,device_model) b > on a.lps_did = b.lps_did and a.app_key=b.app_key > group by a.app_key,a.app_channel,b.device_model; > " > == Physical Plan == > InsertIntoHiveTable MetastoreRelation default, > rps__h_report_more_dimension_first_channel_visit_cd_day, None, > Map(p_event_date -> Some(2016-06-18)), true, false > +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], > functions=[(sum(visits#3L),mode=Final,isDistinct=false)], > output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L]) > +- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7, > app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle > partition size: 500000000]) > +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], > functions=[(sum(visits#3L),mode=Partial,isDistinct=false)], > output=[app_key#7,app_channel#9,device_model#20,sum#41L]) > +- Project [app_key#7,app_channel#9,device_model#20,visits#3L] > +- BroadcastHashJoin [lps_did#8,app_key#7], > [lps_did#13,app_key#12], Inner, BuildRight, None > :- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8)) > : +- HiveTableScan [app_key#7,app_channel#9,lps_did#8], > MetastoreRelation default, > rps__h_report_more_dimension_earliest_newuser_list_c, None > +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, > string], input[0, string])) > +- > TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], > functions=[(count(1),mode=Final,isDistinct=false)], > output=[app_key#12,lps_did#13,device_model#20,visits#3L]) > +- Exchange(coordinator id: 733045095) > hashpartitioning(app_key#12, lps_did#13, device_model#20, 600), > Some(coordinator[target post-shuffle partition size: 500000000]) > +- > TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], > functions=[(count(1),mode=Partial,isDistinct=false)], > output=[app_key#12,lps_did#13,device_model#20,count#39L]) > +- Project [app_key#12,lps_did#13,device_model#20] > +- Filter ((isnotnull(app_key#12) && > isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) || > (cast(log_type#11 as int) = 2))) > +- HiveTableScan > [app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation > default, rps__h_report_more_dimension_small, None, > [isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)] > Time taken: 4.775 seconds, Fetched 1 row(s) > 16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s) > Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], > Inner, BuildRight, None > Result: > 1. Execution failed, spark service is unavailable. > 2. Even though set spark.sql.autoBroadcastJoinThreshold = 1, > BroadcastHashJoin has been used when join two large tables. > Error log is as attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org