Re: column expression in left outer join for DataFrame

2015-03-25 Thread S Krishna
Hi, Thanks for your response. I modified my code as per your suggestion, but now I am getting a runtime error. Here's my code: val df_1 = df.filter( df(event) === 0) . select(country, cnt) val df_2 = df.filter( df(event) === 3) . select(country, cnt)

Re: column expression in left outer join for DataFrame

2015-03-25 Thread Michael Armbrust
Unfortunately you are now hitting a bug (that is fixed in master and will be released in 1.3.1 hopefully next week). However, even with that your query is still ambiguous and you will need to use aliases: val df_1 = df.filter( df(event) === 0) . select(country, cnt).as(a) val

Re: column expression in left outer join for DataFrame

2015-03-25 Thread S Krishna
Hi, Thanks for your response. I am not clear about why the query is ambiguous. val both = df_2.join(df_1, df_2(country)===df_1(country), left_outer) I thought df_2(country)===df_1(country) indicates that the country field in the 2 dataframes should match and df_2(country) is the equivalent of

Re: column expression in left outer join for DataFrame

2015-03-25 Thread Michael Armbrust
Thats a good question. In this particular example, it is really only internal implementation details that make it ambiguous. However, fixing this was a very large change so we have defered it to Spark 1.4 and instead print a warning now when you construct trivially equal expressions. I can try

Re: column expression in left outer join for DataFrame

2015-03-24 Thread Michael Armbrust
You need to use `===`, so that you are constructing a column expression instead of evaluating the standard scala equality method. Calling methods to access columns (i.e. df.county is only supported in python). val join_df = df1.join( df2, df1(country) === df2(country), left_outer) On Tue, Mar