[ https://issues.apache.org/jira/browse/SPARK-18390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651980#comment-15651980 ]
Srinath commented on SPARK-18390: --------------------------------- FYI, these are in branch 2.1 {noformat} commit e6132a6cf10df8b12af8dd8d1a2c563792b5cc5a Author: Srinath Shankar <srin...@databricks.com> Date: Sat Sep 3 00:20:43 2016 +0200 [SPARK-17298][SQL] Require explicit CROSS join for cartesian products {noformat} and {noformat} commit 2d96d35dc0fed6df249606d9ce9272c0f0109fa2 Author: Srinath Shankar <srin...@databricks.com> Date: Fri Oct 14 18:24:47 2016 -0700 [SPARK-17946][PYSPARK] Python crossJoin API similar to Scala {noformat} With the above 2 changes, if a user requests a cross join (with the crossJoin API), the join will always be performed regardless of the physical plan chosen > Optimized plan tried to use Cartesian join when it is not enabled > ----------------------------------------------------------------- > > Key: SPARK-18390 > URL: https://issues.apache.org/jira/browse/SPARK-18390 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.1 > Reporter: Xiangrui Meng > Assignee: Srinath > > {code} > val df2 = spark.range(1e9.toInt).withColumn("one", lit(1)) > val df3 = spark.range(1e9.toInt) > df3.join(df2, df3("id") === df2("one")).count() > {code} > throws > bq. org.apache.spark.sql.AnalysisException: Cartesian joins could be > prohibitively expensive and are disabled by default. To explicitly enable > them, please set spark.sql.crossJoin.enabled = true; > This is probably not the right behavior because it was not the user who > suggested using cartesian product. SQL picked it while knowing it is not > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org