Re: column expression in left outer join for DataFrame

Michael Armbrust Tue, 24 Mar 2015 18:10:29 -0700

You need to use `===`, so that you are constructing a column expression
instead of evaluating the standard scala equality method.  Calling methods
to access columns (i.e. df.county is only supported in python).


val join_df =  df1.join( df2, df1("country") === df2("country"),
"left_outer")

On Tue, Mar 24, 2015 at 5:50 PM, SK <skrishna...@gmail.com> wrote:

> Hi,
>
> I am trying to port some code that was working in Spark 1.2.0 on the latest
> version, Spark 1.3.0. This code involves a left outer join between two
> SchemaRDDs which I am now trying to change to a left outer join between 2
> DataFrames. I followed the example  for left outer join of DataFrame at
>
> https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html
>
> Here's my code, where df1 and df2 are the 2 dataframes I am joining on the
> "country" field:
>
>  val join_df =  df1.join( df2,  df1.country == df2.country, "left_outer")
>
> But I got a compilation error that value  country is not a member of
> sql.DataFrame
>
> I  also tried the following:
>  val join_df =  df1.join( df2, df1("country") == df2("country"),
> "left_outer")
>
> I got a compilation error that it is a Boolean whereas a Column is
> required.
>
> So what is the correct Column expression I need to provide for joining the
> 2
> dataframes on a specific field ?
>
> thanks
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/column-expression-in-left-outer-join-for-DataFrame-tp22209.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: column expression in left outer join for DataFrame

Reply via email to