Stephen Link created SPARK-10933: ------------------------------------ Summary: Spark SQL Joins should have option to fail query when row multiplication is encountered Key: SPARK-10933 URL: https://issues.apache.org/jira/browse/SPARK-10933 Project: Spark Issue Type: Improvement Components: SQL Reporter: Stephen Link Priority: Minor
When constructing spark sql queries, we commonly run into scenarios where users have inadvertently caused a cartesian product/row expansion. It is sometimes possible to detect this in advance with separate queries, but it would be far more ideal if it was possible to have a setting that disallowed join keys showing up multiple times on both sides of a join operation. This setting would belong in SQLConf. The functionality could likely be implemented by forcing a sorted shuffle, then checking for duplication on the streamed results. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org