You need to use `===`, so that you are constructing a column expression instead of evaluating the standard scala equality method. Calling methods to access columns (i.e. df.county is only supported in python).
val join_df = df1.join( df2, df1("country") === df2("country"), "left_outer") On Tue, Mar 24, 2015 at 5:50 PM, SK <skrishna...@gmail.com> wrote: > Hi, > > I am trying to port some code that was working in Spark 1.2.0 on the latest > version, Spark 1.3.0. This code involves a left outer join between two > SchemaRDDs which I am now trying to change to a left outer join between 2 > DataFrames. I followed the example for left outer join of DataFrame at > > https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html > > Here's my code, where df1 and df2 are the 2 dataframes I am joining on the > "country" field: > > val join_df = df1.join( df2, df1.country == df2.country, "left_outer") > > But I got a compilation error that value country is not a member of > sql.DataFrame > > I also tried the following: > val join_df = df1.join( df2, df1("country") == df2("country"), > "left_outer") > > I got a compilation error that it is a Boolean whereas a Column is > required. > > So what is the correct Column expression I need to provide for joining the > 2 > dataframes on a specific field ? > > thanks > > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/column-expression-in-left-outer-join-for-DataFrame-tp22209.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >