You probably don't want null safe equals (<=>) with a left join.
On Mon, Apr 3, 2017 at 5:46 PM gjohnson35 <[email protected]>
wrote:
> The join condition with && is throwing an exception:
>
> val df = baseDF.join(mccDF, mccDF("medical_claim_id") <=>
> baseDF("medical_claim_id")
> && mccDF("medical_claim_detail_id") <=>
> baseDF("medical_claim_detail_id"), "left")
> .join(revCdDF, revCdDF("revenue_code_padded_str") <=>
> mccDF("mcc_code"), "left")
> .select(baseDF("medical_claim_id"),
> baseDF("medical_claim_detail_id"),
> baseDF("revenue_code"), baseDF("rev_code_distinct_count"),
> baseDF("rtos_1_1_count"), baseDF("rtos_1_0_count"),
> baseDF("er_visit_flag"), baseDF("observation_stay_flag"),
> revCdDF("rtos_2_code"), revCdDF("rtos_2_hierarchy"))
> .where(revCdDF("rtos_2_code").between(8, 27).isNotNull)
> .groupBy(
> baseDF("medical_claim_id"),
> baseDF("medical_claim_detail_id")
> )
> .agg(min(revCdDF("rtos_2_code").alias("min_rtos_2_8_thru_27")),
> min(revCdDF("rtos_2_hierarchy").alias("min_rtos_2_8_thru_27_hier")))
>
>
> This query runs fine:
>
> val df = baseDF.join(mccDF, mccDF("medical_claim_id") <=>
> baseDF("medical_claim_id"), "left")
> .join(mccDF, mccDF("medical_claim_detail_id") <=>
> baseDF("medical_claim_detail_id"), "left")
> .join(revCdDF, revCdDF("revenue_code_padded_str") <=>
> mccDF("mcc_code"), "left")
> .select(baseDF("medical_claim_id"),
> baseDF("medical_claim_detail_id"),
> baseDF("revenue_code"), baseDF("rev_code_distinct_count"),
> baseDF("rtos_1_1_count"), baseDF("rtos_1_0_count"),
> baseDF("er_visit_flag"), baseDF("observation_stay_flag"),
> revCdDF("rtos_2_code"), revCdDF("rtos_2_hierarchy"))
> .where(revCdDF("rtos_2_code").between(8, 27).isNotNull)
> .groupBy(
> baseDF("medical_claim_id"),
> baseDF("medical_claim_detail_id")
> )
> .agg(min(revCdDF("rtos_2_code").alias("min_rtos_2_8_thru_27")),
> min(revCdDF("rtos_2_hierarchy").alias("min_rtos_2_8_thru_27_hier")))
>
> If I remove the multiple Columns in the join and create a join statement
> for
> each one then the exception goes away. Is there a better way to join
> multiple columns?
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-left-join-with-multiple-columns-Join-condition-is-missing-or-trivial-Use-the-CROSS-JOIN-syntax-tp21297.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>