[jira] [Commented] (SPARK-25961) Random numbers are not supported when handling data skew

zengxl (JIRA) Thu, 08 Nov 2018 21:54:56 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-25961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680908#comment-16680908
 ]


zengxl commented on SPARK-25961:
--------------------------------

I'm not asking for an investigation. My question is why sparksql does not 
support join  when using random numbers? Hive supports random numbers when 
join, which results in incompatibility between sparksql and hive

> Random numbers are not supported when handling data skew
> --------------------------------------------------------
>
>                 Key: SPARK-25961
>                 URL: https://issues.apache.org/jira/browse/SPARK-25961
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.1
>         Environment: spark on yarn 2.3.1
>            Reporter: zengxl
>            Priority: Major
>
> My SQL query uses two tables to join. One table join key has null value. I 
> use random value instead of {{null}};however, it has error. The error as 
> follows:
> {code}
> Error in query: nondeterministic expressions are only allowed in
> Project, Filter, Aggregate or Window, found
> scan spark source code is 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis check sql, because the 
> number of random variables is uncertain, it is prohibited
> {code}
> {code}
> case o if o.expressions.exists(!_.deterministic) &&
>  !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
>  !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] =>
>  // The rule above is used to check Aggregate operator.
>  failAnalysis(
>  s"""nondeterministic expressions are only allowed in
> |Project, Filter, Aggregate or Window, found:|
> |${o.expressions.map(_.sql).mkString(",")}|
> |in operator ${operator.simpleString}
>  """.stripMargin)|
> {code}
>  
> Is it possible to add Join to this code? It's not yet tested.And whether 
> there will be other effects
> {code}
> case o if o.expressions.exists(!_.deterministic) &&
>  !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
>  !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] +{color:#d04437}&& 
> !o.isInstanceOf[Join]{color}+ =>
>  // The rule above is used to check Aggregate operator.
>  failAnalysis(
>  s"""nondeterministic expressions are only allowed in
> |Project, Filter, Aggregate or Window or Join, found:|
> |${o.expressions.map(_.sql).mkString(",")}|
> |in operator ${operator.simpleString}
>  """.stripMargin)|
> {code}
> This is my SQL query is as below:
> {code}
> SELECT T1.CUST_NO AS CUST_NO,
>        T3.CON_LAST_NAME AS CUST_NAME,
>        T3.CON_SEX_MF AS SEX_CODE,
>        T3.X_POSITION AS POST_LV_CODE
> FROM tmp.ICT_CUST_RANGE_INFO T1
> LEFT JOIN tmp.F_CUST_BASE_INFO_ALL T3 ON CASE
>                                              WHEN coalesce(T1.CUST_NO, '') 
> ='' THEN concat('cust_no', RAND())
>                                              ELSE T1.CUST_NO
>                                          END = T3.BECIF
> AND T3.DATE='20181105'
> WHERE T1.DATE='20181105'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25961) Random numbers are not supported when handling data skew

Reply via email to