Re: Hi all,

2017-11-04 Thread אורן שמון
Hi Jean, We prepare the data for all another jobs. We have a lot of jobs that schedule to different time but all of them need to read same raw data. On Fri, Nov 3, 2017 at 12:49 PM Jean Georges Perrin wrote: > Hi Oren, > > Why don’t you want to use a GroupBy? You can cache

Re: Hi all,

2017-11-03 Thread Jean Georges Perrin
Hi Oren, Why don’t you want to use a GroupBy? You can cache or checkpoint the result and use it in your process, keeping everything in Spark and avoiding save/ingestion... > On Oct 31, 2017, at 08:17, ⁨אורן שמון⁩ <⁨oren.sha...@gmail.com⁩> wrote: > > I have 2 spark jobs one is pre-process and

Hi all,

2017-10-31 Thread אורן שמון
I have 2 spark jobs one is pre-process and the second is the process. Process job needs to calculate for each user in the data. I want to avoid shuffle like groupBy so I think about to save the result of the pre-process as bucket by user in Parquet or to re-partition by user and save the result .

hi all

2014-10-16 Thread Paweł Szulc
Hi, I just wanted to say hi all to the Spark community. I'm developing some stuff right now using Spark (we've started very recently). As the API documentation of Spark is really really good, I like to get deeper knowledge of the internal stuff -you know, the goodies. Watching movies from Spark