I think productBroadcastDF is broadcast variable in your case, not the DF
itself. Try the join with productBroadcastDF.value

On Wed, Jan 4, 2017 at 1:04 AM, Patrick <titlibat...@gmail.com> wrote:

> Hi,
>
> An Update on above question: In Local[*] mode code is working fine. The
> Broadcast size is 200MB, but on Yarn it the broadcast join is giving empty
> result.But in Sql Query in UI, it does show BroadcastHint.
>
> Thanks
>
>
> On Fri, Dec 30, 2016 at 9:15 PM, titli batali <titlibat...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have two dataframes which has common column Product_Id on which i have
>> to perform a join operation.
>>
>>     val transactionDF = readCSVToDataFrame(sqlCtx: SQLContext,
>> pathToReadTransactions: String, transactionSchema: StructType)
>>     val productDF = readCSVToDataFrame(sqlCtx: SQLContext,
>> pathToReadProduct:String, productSchema: StructType)
>>
>> As, transaction data is very large but product data is small, i would
>> ideally do a  broadcast join where i braodcast productDF.
>>
>>      val productBroadcastDF =  broadcast(productDF)
>>      val broadcastJoin = transcationDF.join(productBroadcastDF,
>> "productId")
>>
>> Or simply,  val innerJoin = transcationDF.join(productDF, "productId")
>> should give the same result as above.
>>
>> But If i join using simple inner join i get  dataframe  with joined
>> values whereas if i do broadcast join i get empty dataframe with empty
>> values. I am not able to explain this behavior. Ideally both should give
>> the same result.
>>
>> What could have gone wrong. Any one faced the similar issue?
>>
>>
>> Thanks,
>> Prateek
>>
>>
>>
>>
>>
>
>


-- 
Best Regards,
Ayan Guha

Reply via email to