Re: Hi all,

2017-11-04 Thread אורן שמון
Hi Jean,
We prepare the data for all another jobs. We have a lot of jobs that
schedule to different time but all of them need to read same raw data.

On Fri, Nov 3, 2017 at 12:49 PM Jean Georges Perrin 
wrote:

> Hi Oren,
>
> Why don’t you want to use a GroupBy? You can cache or checkpoint the
> result and use it in your process, keeping everything in Spark and avoiding
> save/ingestion...
>
>
> > On Oct 31, 2017, at 08:17, ⁨אורן שמון⁩ <⁨oren.sha...@gmail.com⁩> wrote:
> >
> > I have 2 spark jobs one is pre-process and the second is the process.
> > Process job needs to calculate for each user in the data.
> > I want  to avoid shuffle like groupBy so I think about to save the
> result of the pre-process as bucket by user in Parquet or to re-partition
> by user and save the result .
> >
> > What is prefer ? and why
> > Thanks in advance,
> > Oren
>
>


Re: Hi all,

2017-11-03 Thread Jean Georges Perrin
Hi Oren,

Why don’t you want to use a GroupBy? You can cache or checkpoint the result and 
use it in your process, keeping everything in Spark and avoiding 
save/ingestion...


> On Oct 31, 2017, at 08:17, ⁨אורן שמון⁩ <⁨oren.sha...@gmail.com⁩> wrote:
> 
> I have 2 spark jobs one is pre-process and the second is the process.
> Process job needs to calculate for each user in the data.
> I want  to avoid shuffle like groupBy so I think about to save the result of 
> the pre-process as bucket by user in Parquet or to re-partition by user and 
> save the result .
> 
> What is prefer ? and why 
> Thanks in advance,
> Oren


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Hi all,

2017-10-31 Thread אורן שמון
I have 2 spark jobs one is pre-process and the second is the process.
Process job needs to calculate for each user in the data.
I want  to avoid shuffle like groupBy so I think about to save the result
of the pre-process as bucket by user in Parquet or to re-partition by user
and save the result .

What is prefer ? and why
Thanks in advance,
Oren


hi all

2014-10-16 Thread Paweł Szulc
Hi,

I just wanted to say hi all to the Spark community. I'm developing some
stuff right now using Spark (we've started very recently). As the API
documentation of Spark is really really good, I like to get deeper
knowledge of the internal stuff  -you know, the goodies. Watching movies
from Spark Summits helps, nevertheless I hope to learn a lot from reading
this mailing list.

Regrads,
Pawel Szulc