A workaround is diffcult.
You should consider merging this PR 
<https://github.com/apache/spark/pull/10128> into your Spark.







 
 "wangshuang [via Apache Spark Developers 
List]"<ml+s1001551n2195...@n3.nabble.com> wroted at 2017-07-13 18:43:

I'm trying to execute hive sql on spark sql (Also on spark thriftserver), For 
optimizing data skew, we use "case when" to handle null.
Simple sql as following:


SELECT a.col1
 FROM tbl1 a
 LEFT OUTER JOIN tbl2 b
 ON
         CASE
                 WHEN a.col2 IS NULL
                         TNEN cast(rand(9)*1000 - 9999999999 as string)
                 ELSE
                         a.col2 END
         = b.col3;


But I get the error:

== Physical Plan ==
org.apache.spark.sql.AnalysisException: nondeterministic expressions are only 
allowed in
Project, Filter, Aggregate or Window, found:
 (((CASE WHEN (a.`nav_tcdt` IS NULL) THEN CAST(((rand(9) * CAST(1000 AS 
DOUBLE)) - CAST(9999999999L AS DOUBLE)) AS STRING) ELSE a.`nav_tcdt` END = 
c.`site_categ_id`) AND (CAST(a.`nav_tcd` AS INT) = 9)) AND (c.`cur_flag` = 1))
in operator Join LeftOuter, (((CASE WHEN isnull(nav_tcdt#25) THEN 
cast(((rand(9) * cast(1000 as double)) - cast(9999999999 as double)) as string) 
ELSE nav_tcdt#25 END = site_categ_id#80) && (cast(nav_tcd#26 as int) = 9)) && 
(cur_flag#77 = 1))
               ;;
GlobalLimit 10
+- LocalLimit 10
   +- Aggregate [date_id#7, CASE WHEN (cast(city_id#10 as string) IN 
(cast(19596 as string),cast(20134 as string),cast(10997 as string)) && 
nav_tcdt#25 RLIKE ^[0-9]+$) THEN city_id#10 ELSE nav_tpa_id#21 END], [date_id#7]
      +- Filter (date_id#7 = 2017-07-12)
         +- Join LeftOuter, (((CASE WHEN isnull(nav_tcdt#25) THEN 
cast(((rand(9) * cast(1000 as double)) - cast(9999999999 as double)) as string) 
ELSE nav_tcdt#25 END = site_categ_id#80) && (cast(nav_tcd#26 as int) = 9)) && 
(cur_flag#77 = 1))
            :- SubqueryAlias a
            :  +- SubqueryAlias tmp_lifan_trfc_tpa_hive
            :     +- CatalogRelation `tmp`.`tmp_lifan_trfc_tpa_hive`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [date_id#7, chanl_id#8L, 
pltfm_id#9, city_id#10, sessn_id#11, gu_id#12, nav_refer_page_type_id#13, 
nav_refer_page_value#14, nav_refer_tpa#15, nav_refer_tpa_id#16, 
nav_refer_tpc#17, nav_refer_tpi#18, nav_page_type_id#19, nav_page_value#20, 
nav_tpa_id#21, nav_tpa#22, nav_tpc#23, nav_tpi#24, nav_tcdt#25, nav_tcd#26, 
nav_tci#27, nav_tce#28, detl_refer_page_type_id#29, detl_refer_page_value#30, 
... 33 more fields]
            +- SubqueryAlias c
               +- SubqueryAlias dim_site_categ_ext
                  +- CatalogRelation `dw`.`dim_site_categ_ext`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [site_categ_skid#64L, 
site_categ_type#65, site_categ_code#66, site_categ_name#67, 
site_categ_parnt_skid#68L, site_categ_kywrd#69, leaf_flg#70L, sort_seq#71L, 
site_categ_srch_name#72, vsbl_flg#73, delet_flag#74, etl_batch_id#75L, 
updt_time#76, cur_flag#77, bkgrnd_categ_skid#78L, bkgrnd_categ_id#79L, 
site_categ_id#80, site_categ_parnt_id#81]

Does spark sql not support syntax "case when" in JOIN?  Additional, my spark 
version is 2.2.0.
Any help would be greatly appreciated.

If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-Syntax-case-when-doesn-t-be-supported-in-JOIN-tp21953.html
To start a new topic under Apache Spark Developers List, email 
ml+s1001551n1...@n3.nabble.com
 To unsubscribe from Apache Spark Developers List, click here.
 NAML







--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-SQL-Syntax-case-when-doesn-t-be-supported-in-JOIN-tp21960.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Reply via email to