I think productBroadcastDF is broadcast variable in your case, not the DF itself. Try the join with productBroadcastDF.value
On Wed, Jan 4, 2017 at 1:04 AM, Patrick <titlibat...@gmail.com> wrote: > Hi, > > An Update on above question: In Local[*] mode code is working fine. The > Broadcast size is 200MB, but on Yarn it the broadcast join is giving empty > result.But in Sql Query in UI, it does show BroadcastHint. > > Thanks > > > On Fri, Dec 30, 2016 at 9:15 PM, titli batali <titlibat...@gmail.com> > wrote: > >> Hi, >> >> I have two dataframes which has common column Product_Id on which i have >> to perform a join operation. >> >> val transactionDF = readCSVToDataFrame(sqlCtx: SQLContext, >> pathToReadTransactions: String, transactionSchema: StructType) >> val productDF = readCSVToDataFrame(sqlCtx: SQLContext, >> pathToReadProduct:String, productSchema: StructType) >> >> As, transaction data is very large but product data is small, i would >> ideally do a broadcast join where i braodcast productDF. >> >> val productBroadcastDF = broadcast(productDF) >> val broadcastJoin = transcationDF.join(productBroadcastDF, >> "productId") >> >> Or simply, val innerJoin = transcationDF.join(productDF, "productId") >> should give the same result as above. >> >> But If i join using simple inner join i get dataframe with joined >> values whereas if i do broadcast join i get empty dataframe with empty >> values. I am not able to explain this behavior. Ideally both should give >> the same result. >> >> What could have gone wrong. Any one faced the similar issue? >> >> >> Thanks, >> Prateek >> >> >> >> >> > > -- Best Regards, Ayan Guha