[ https://issues.apache.org/jira/browse/SPARK-25961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680152#comment-16680152 ]
Kris Mok commented on SPARK-25961: ---------------------------------- It looks like the current restriction makes sense, because the expressions in join condition may eventually be evaluated multiple times depending on which physical join operator is chosen. It doesn't make a lot of sense to allow non-deterministic expression directly in the Join operator. Instead, if we have to support having non-deterministic expression in the join condition and retain an "evaluated-once" semantic, it'd be better to have a rule in the Analyzer to extract non-deterministic expressions from the join condition and put it into a child Project operator on the appropriate side. [~zengxl] does that make sense to you? > Random numbers are not supported when handling data skew > -------------------------------------------------------- > > Key: SPARK-25961 > URL: https://issues.apache.org/jira/browse/SPARK-25961 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.1 > Environment: spark on yarn 2.3.1 > Reporter: zengxl > Priority: Major > > my query sql use two table join,one table join key has null value,i use rand > value instead of null value,but has error,the error info as follows: > Error in query: nondeterministic expressions are only allowed in > Project, Filter, Aggregate or Window, found > > > scan spark source code is > org.apache.spark.sql.catalyst.analysis.CheckAnalysis check sql, because the > number of random variables is uncertain, it is prohibited > case o if o.expressions.exists(!_.deterministic) && > !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] && > !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] => > // The rule above is used to check Aggregate operator. > failAnalysis( > s"""nondeterministic expressions are only allowed in > |Project, Filter, Aggregate or Window, found:| > |${o.expressions.map(_.sql).mkString(",")}| > |in operator ${operator.simpleString} > """.stripMargin)| > > Is it possible to add Join to this code? It's not yet tested.And whether > there will be other effects > case o if o.expressions.exists(!_.deterministic) && > !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] && > !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] +{color:#d04437}&& > !o.isInstanceOf[Join]{color}+ => > // The rule above is used to check Aggregate operator. > failAnalysis( > s"""nondeterministic expressions are only allowed in > |Project, Filter, Aggregate or Window or Join, found:| > |${o.expressions.map(_.sql).mkString(",")}| > |in operator ${operator.simpleString} > """.stripMargin)| > > this is my sparksql: > SELECT > T1.CUST_NO AS CUST_NO , > T3.CON_LAST_NAME AS CUST_NAME , > T3.CON_SEX_MF AS SEX_CODE , > T3.X_POSITION AS POST_LV_CODE > FROM tmp.ICT_CUST_RANGE_INFO T1 > LEFT join tmp.F_CUST_BASE_INFO_ALL T3 ON CASE WHEN coalesce(T1.CUST_NO,'') > ='' THEN concat('cust_no',RAND()) ELSE T1.CUST_NO END = T3.BECIF and > T3.DATE='20181105' > WHERE T1.DATE='20181105' -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org