Re: Cant join same dataframe twice ?

Takeshi Yamamuro Tue, 26 Apr 2016 21:08:59 -0700

Yeah, I think so. This is a kind of common mistakes.

// maropu


On Wed, Apr 27, 2016 at 1:05 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> The ambiguity came from:
>
> scala> df3.schema
> res0: org.apache.spark.sql.types.StructType =
> StructType(StructField(a,IntegerType,false),
> StructField(b,IntegerType,false), StructField(b,IntegerType,false))
>
> On Tue, Apr 26, 2016 at 8:54 PM, Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried;
>> val df1 = Seq((1, 1), (2, 2), (3, 3)).toDF("a", "b")
>> val df2 = Seq((1, 1), (2, 2), (3, 3)).toDF("a", "b")
>> val df3 = df1.join(df2, "a")
>> val df4 = df3.join(df2, "b")
>>
>> And I got; org.apache.spark.sql.AnalysisException: Reference 'b' is
>> ambiguous, could be: b#6, b#14.;
>> If same case, this message makes sense and this is clear.
>>
>> Thought?
>>
>> // maropu
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Apr 27, 2016 at 6:09 AM, Prasad Ravilla <pras...@slalom.com>
>> wrote:
>>
>>> Also, check the column names of df1 ( after joining df2 and df3 ).
>>>
>>> Prasad.
>>>
>>> From: Ted Yu
>>> Date: Monday, April 25, 2016 at 8:35 PM
>>> To: Divya Gehlot
>>> Cc: "user @spark"
>>> Subject: Re: Cant join same dataframe twice ?
>>>
>>> Can you show us the structure of df2 and df3 ?
>>>
>>> Thanks
>>>
>>> On Mon, Apr 25, 2016 at 8:23 PM, Divya Gehlot <divya.htco...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> I am using Spark 1.5.2 .
>>>> I have a use case where I need to join the same dataframe twice on two
>>>> different columns.
>>>> I am getting error missing Columns
>>>>
>>>> For instance ,
>>>> val df1 = df2.join(df3,"Column1")
>>>> Below throwing error missing columns
>>>> val df 4 = df1.join(df3,"Column2")
>>>>
>>>> Is the bug or valid scenario ?
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Divya
>>>>
>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Re: Cant join same dataframe twice ?

Reply via email to