Got it. Thanks! J
From: Yin Huai [mailto:yh...@databricks.com]
Sent: Thursday, April 23, 2015 2:35 PM
To: Shuai Zheng
Cc: user
Subject: Re: Bug? Can't reference to the column by name after join two
DataFrame on a same name key
Hi Shuai,
You can use as to create a table alias. For example, df1.as(df1). Then you
can use $df1.col to refer it.
Thanks,
Yin
On Thu, Apr 23, 2015 at 11:14 AM, Shuai Zheng szheng.c...@gmail.com wrote:
Hi All,
I use 1.3.1
When I have two DF and join them on a same name key, after that, I can’t get
the common key by name.
Basically:
select * from t1 inner join t2 on t1.col1 = t2.col1
And I am using purely DataFrame, spark SqlContext not HiveContext
DataFrame df3 = df1.join(df2, df1.col(col).equalTo(df2.col(col))).select(col);
because df1 and df2 join on the same key col,
Then I can't reference the key col. I understand I should use a full qualified
name for that column (like in SQL, use t1.col), but I don’t know how should I
address this in spark sql.
Exception in thread main org.apache.spark.sql.AnalysisException: Reference
'id' is ambiguous, could be: id#8L, id#0L.;
It looks that joined key can't be referenced by name or by df1.col name pattern.
The https://issues.apache.org/jira/browse/SPARK-5278 refer to a hive case, so I
am not sure whether it is the same issue, but I still have the issue in latest
code.
It looks like the result after join won't keep the parent DF information
anywhere?
I check the ticket: https://issues.apache.org/jira/browse/SPARK-6273
But not sure whether it is the same issue? Should I open a new ticket for this?
Regards,
Shuai