Found this as I am having the same issue. I have exactly the same usage as shown in Michael's join example. I tried executing a SQL statement against the join data set with two columns that have the same name and tried to "unambiguate" the column name with the table alias, but I would still get an "Unresolved attributes" error back. Is there any way around this short of renaming the columns in the join sources?
Thanks -Terry Michael Armbrust wrote Yes, but if both tagCollection and selectedVideos have a column named "id" then Spark SQL does not know which one you are referring to in the where clause. Here's an example with aliases: val x = testData2.as('x) val y = testData2.as('y) val join = x.join(y, Inner, Some("x.a".attr === "y.a".attr)) On Wed, Jul 16, 2014 at 2:47 AM, Jaonary Rabarisoa < jaonary@ > wrote: My query is just a simple query that use the spark sql dsl : tagCollection.join(selectedVideos).where('videoId === 'id) On Tue, Jul 15, 2014 at 6:03 PM, Yin Huai < huaiyin.thu@ > wrote: Hi Jao, Seems the SQL analyzer cannot resolve the references in the Join condition. What is your query? Did you use the Hive Parser (your query was submitted through hql(...)) or the basic SQL Parser (your query was submitted through sql(...)). Thanks, Yin On Tue, Jul 15, 2014 at 8:52 AM, Jaonary Rabarisoa < jaonary@ > wrote: Hi all, When running a join operation with Spark SQL I got the following error : Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Ambiguous references to id: (id#303,List()),(id#0,List()), tree: Filter ('videoId = 'id) Join Inner, None ParquetRelation data/tags.parquet Filter (name#1 = P1/cam1) ParquetRelation data/videos.parquet What does it mean ? Cheers, jao