Re: Bug? Can't reference to the column by name after join two DataFrame on a same name key

2015-04-23 Thread Yin Huai
Hi Shuai,

You can use as to create a table alias. For example, df1.as(df1). Then
you can use $df1.col to refer it.

Thanks,

Yin

On Thu, Apr 23, 2015 at 11:14 AM, Shuai Zheng szheng.c...@gmail.com wrote:

 Hi All,



 I use 1.3.1



 When I have two DF and join them on a same name key, after that, I can’t
 get the common key by name.



 Basically:

 select * from t1 inner join t2 on t1.col1 = t2.col1



 And I am using purely DataFrame, spark SqlContext not HiveContext



 DataFrame df3 = df1.join(df2, df1.col(col).equalTo(df2.col(col))).select(
 *col*);



 because df1 and df2 join on the same key col,



 Then I can't reference the key col. I understand I should use a full
 qualified name for that column (like in SQL, use t1.col), but I don’t know
 how should I address this in spark sql.



 Exception in thread main org.apache.spark.sql.AnalysisException:
 Reference 'id' is ambiguous, could be: id#8L, id#0L.;



 It looks that joined key can't be referenced by name or by df1.col name
 pattern.

 The https://issues.apache.org/jira/browse/SPARK-5278 refer to a hive
 case, so I am not sure whether it is the same issue, but I still have the
 issue in latest code.



 It looks like the result after join won't keep the parent DF information
 anywhere?



 I check the ticket: https://issues.apache.org/jira/browse/SPARK-6273



 But not sure whether  it is the same issue? Should I open a new ticket for
 this?



 Regards,



 Shuai





RE: Bug? Can't reference to the column by name after join two DataFrame on a same name key

2015-04-23 Thread Shuai Zheng
Got it. Thanks! J

 

 

From: Yin Huai [mailto:yh...@databricks.com] 
Sent: Thursday, April 23, 2015 2:35 PM
To: Shuai Zheng
Cc: user
Subject: Re: Bug? Can't reference to the column by name after join two 
DataFrame on a same name key

 

Hi Shuai,

 

You can use as to create a table alias. For example, df1.as(df1). Then you 
can use $df1.col to refer it. 

 

Thanks,

 

Yin

 

On Thu, Apr 23, 2015 at 11:14 AM, Shuai Zheng szheng.c...@gmail.com wrote:

Hi All,

 

I use 1.3.1

 

When I have two DF and join them on a same name key, after that, I can’t get 
the common key by name.

 

Basically:

select * from t1 inner join t2 on t1.col1 = t2.col1

 

And I am using purely DataFrame, spark SqlContext not HiveContext

 

DataFrame df3 = df1.join(df2, df1.col(col).equalTo(df2.col(col))).select(col);

 

because df1 and df2 join on the same key col,

 

Then I can't reference the key col. I understand I should use a full qualified 
name for that column (like in SQL, use t1.col), but I don’t know how should I 
address this in spark sql.

 

Exception in thread main org.apache.spark.sql.AnalysisException: Reference 
'id' is ambiguous, could be: id#8L, id#0L.;

 

It looks that joined key can't be referenced by name or by df1.col name pattern.

The https://issues.apache.org/jira/browse/SPARK-5278 refer to a hive case, so I 
am not sure whether it is the same issue, but I still have the issue in latest 
code.

 

It looks like the result after join won't keep the parent DF information 
anywhere?

 

I check the ticket: https://issues.apache.org/jira/browse/SPARK-6273

 

But not sure whether  it is the same issue? Should I open a new ticket for this?

 

Regards,

 

Shuai