Re: How to do map join in Spark SQL

Alexander Pivovarov Sat, 19 Dec 2015 09:11:44 -0800

I collected small DF to array of tuple3
Then I registered UDF with function which is doing lookup in the array
Then I just run select which uses the UDF.
On Dec 18, 2015 1:06 AM, "Akhil Das" <ak...@sigmoidanalytics.com> wrote:


> You can broadcast your json data and then do a map side join. This article
> is a good start http://dmtolpeko.com/2015/02/20/map-side-join-in-spark/
>
> Thanks
> Best Regards
>
> On Wed, Dec 16, 2015 at 2:51 AM, Alexander Pivovarov <apivova...@gmail.com
> > wrote:
>
>> I have big folder having ORC files. Files have duration field (e.g.
>> 3,12,26, etc)
>> Also I have small json file  (just 8 rows) with ranges definition (min,
>> max , name)
>> 0, 10, A
>> 10, 20, B
>> 20, 30, C
>> etc
>>
>> Because I can not do equi-join btw duration and range min/max I need to
>> do cross join and apply WHERE condition to take records which belong to the
>> range
>> Cross join is an expensive operation I think that it's better if this
>> particular join done using Map Join
>>
>> How to do Map join in Spark Sql?
>>
>
>

Re: How to do map join in Spark SQL

Reply via email to