Re: join two DataFrames, same column name

2015-03-23 Thread Eric Friedman

 You can include * and a column alias in the same select clause
 var df1 = sqlContext.sql(select *, column_id AS table1_id from table1)


FYI, this does not ultimately work as the * still includes column_id and
you cannot have two columns of that name in the joined DataFrame.  So I
ended up aliasing both sides of the join.

On Sun, Mar 22, 2015 at 1:25 PM, Michael Armbrust mich...@databricks.com
wrote:

 You can include * and a column alias in the same select clause
 var df1 = sqlContext.sql(select *, column_id AS table1_id from table1)


 I'm also hoping to resolve SPARK-6376
 https://issues.apache.org/jira/browse/SPARK-6376 before Spark 1.3.1
 which will let you do something like:
 var df1 = sqlContext.sql(select * from table1).as(t1)
 var df2 = sqlContext.sql(select * from table2).as(t2)
 df1.join(df2, df1(column_id) === df2(column_id)).select(t1.column_id)

 Finally, there is SPARK-6380
 https://issues.apache.org/jira/browse/SPARK-6380 that hopes to simplify
 this particular case.

 Michael

 On Sat, Mar 21, 2015 at 3:02 PM, Eric Friedman eric.d.fried...@gmail.com
 wrote:

 I have a couple of data frames that I pulled from SparkSQL and the
 primary key of one is a foreign key of the same name in the other.  I'd
 rather not have to specify each column in the SELECT statement just so that
 I can rename this single column.

 When I try to join the data frames, I get an exception because it finds
 the two columns of the same name to be ambiguous.  Is there a way to
 specify which side of the join comes from data frame A and which comes from
 B?

 var df1 = sqlContext.sql(select * from table1)
 var df2 = sqlContext.sql(select * from table2)

 df1.join(df2, df1(column_id) === df2(column_id))





Re: join two DataFrames, same column name

2015-03-23 Thread Eric Friedman
Michael, thank you for the workaround and for letting me know of the
upcoming enhancements, both of which sound appealing.

On Sun, Mar 22, 2015 at 1:25 PM, Michael Armbrust mich...@databricks.com
wrote:

 You can include * and a column alias in the same select clause
 var df1 = sqlContext.sql(select *, column_id AS table1_id from table1)


 I'm also hoping to resolve SPARK-6376
 https://issues.apache.org/jira/browse/SPARK-6376 before Spark 1.3.1
 which will let you do something like:
 var df1 = sqlContext.sql(select * from table1).as(t1)
 var df2 = sqlContext.sql(select * from table2).as(t2)
 df1.join(df2, df1(column_id) === df2(column_id)).select(t1.column_id)

 Finally, there is SPARK-6380
 https://issues.apache.org/jira/browse/SPARK-6380 that hopes to simplify
 this particular case.

 Michael

 On Sat, Mar 21, 2015 at 3:02 PM, Eric Friedman eric.d.fried...@gmail.com
 wrote:

 I have a couple of data frames that I pulled from SparkSQL and the
 primary key of one is a foreign key of the same name in the other.  I'd
 rather not have to specify each column in the SELECT statement just so that
 I can rename this single column.

 When I try to join the data frames, I get an exception because it finds
 the two columns of the same name to be ambiguous.  Is there a way to
 specify which side of the join comes from data frame A and which comes from
 B?

 var df1 = sqlContext.sql(select * from table1)
 var df2 = sqlContext.sql(select * from table2)

 df1.join(df2, df1(column_id) === df2(column_id))





Re: join two DataFrames, same column name

2015-03-22 Thread Michael Armbrust
You can include * and a column alias in the same select clause
var df1 = sqlContext.sql(select *, column_id AS table1_id from table1)


I'm also hoping to resolve SPARK-6376
https://issues.apache.org/jira/browse/SPARK-6376 before Spark 1.3.1 which
will let you do something like:
var df1 = sqlContext.sql(select * from table1).as(t1)
var df2 = sqlContext.sql(select * from table2).as(t2)
df1.join(df2, df1(column_id) === df2(column_id)).select(t1.column_id)

Finally, there is SPARK-6380
https://issues.apache.org/jira/browse/SPARK-6380 that hopes to simplify
this particular case.

Michael

On Sat, Mar 21, 2015 at 3:02 PM, Eric Friedman eric.d.fried...@gmail.com
wrote:

 I have a couple of data frames that I pulled from SparkSQL and the primary
 key of one is a foreign key of the same name in the other.  I'd rather not
 have to specify each column in the SELECT statement just so that I can
 rename this single column.

 When I try to join the data frames, I get an exception because it finds
 the two columns of the same name to be ambiguous.  Is there a way to
 specify which side of the join comes from data frame A and which comes from
 B?

 var df1 = sqlContext.sql(select * from table1)
 var df2 = sqlContext.sql(select * from table2)

 df1.join(df2, df1(column_id) === df2(column_id))



join two DataFrames, same column name

2015-03-21 Thread Eric Friedman
I have a couple of data frames that I pulled from SparkSQL and the primary
key of one is a foreign key of the same name in the other.  I'd rather not
have to specify each column in the SELECT statement just so that I can
rename this single column.

When I try to join the data frames, I get an exception because it finds the
two columns of the same name to be ambiguous.  Is there a way to specify
which side of the join comes from data frame A and which comes from B?

var df1 = sqlContext.sql(select * from table1)
var df2 = sqlContext.sql(select * from table2)

df1.join(df2, df1(column_id) === df2(column_id))