[ https://issues.apache.org/jira/browse/SPARK-29147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-29147: --------------------------------- Priority: Major (was: Critical) > Spark doesn't use shuffleHashJoin as expected > --------------------------------------------- > > Key: SPARK-29147 > URL: https://issues.apache.org/jira/browse/SPARK-29147 > Project: Spark > Issue Type: Question > Components: Spark Core, SQL > Affects Versions: 2.4.3, 2.4.4 > Reporter: Artsiom Yudovin > Priority: Major > > I run the following code: > {code:java} > val spark = SparkSession.builder() > .appName("ShuffleHashJoin") > .master("local[*]") > .config("spark.sql.autoBroadcastJoinThreshold", 0) > .config("spark.sql.join.preferSortMergeJoin", value = false) > .getOrCreate() > import spark.implicits._ > val dataset = Seq( > ("1", "playing"), > ("2", "with"), > ("3", "ShuffledHashJoinExec") > ).toDF("id", "token") > val dataset1 = Seq( > ("1", "playing"), > ("2", "with"), > ("3", "ShuffledHashJoinExec") > ).toDF("id1", "token") > > dataset.join(dataset1, $"id" === $"id1", "inner").foreach(t => println(t)) > {code} > My expectation that Spark will use 'shuffleHashJoin' but I see in SparkUI and > explain() that Spark uses 'sortMergeJoin' -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org