Re: Join between Streaming data vs Historical Data in spark

2015-05-05 Thread Rendy Bambang Junior
Thanks. Since join will be done in regular basis in short period of time ( let say 20s) do you have any suggestions how to make it faster? I am thinking of partitioning data set and cache it. Rendy On Apr 30, 2015 6:31 AM, Tathagata Das t...@databricks.com wrote: Have you taken a look at the

Re: Join between Streaming data vs Historical Data in spark

2015-04-29 Thread Tathagata Das
Have you taken a look at the join section in the streaming programming guide? http://spark.apache.org/docs/latest/streaming-programming-guide.html#stream-dataset-joins On Wed, Apr 29, 2015 at 7:11 AM, Rendy Bambang Junior rendy.b.jun...@gmail.com wrote: Let say I have transaction data and

Join between Streaming data vs Historical Data in spark

2015-04-29 Thread Rendy Bambang Junior
Let say I have transaction data and visit data visit | userId | Visit source | Timestamp | | A | google ads | 1 | | A | facebook ads | 2 | transaction | userId | total price | timestamp | | A | 100 | 248384| | B | 200 | 43298739 | I