Hi Jean,
We prepare the data for all another jobs. We have a lot of jobs that
schedule to different time but all of them need to read same raw data.
On Fri, Nov 3, 2017 at 12:49 PM Jean Georges Perrin
wrote:
> Hi Oren,
>
> Why don’t you want to use a GroupBy? You can cache
Hi Oren,
Why don’t you want to use a GroupBy? You can cache or checkpoint the result and
use it in your process, keeping everything in Spark and avoiding
save/ingestion...
> On Oct 31, 2017, at 08:17, אורן שמון <oren.sha...@gmail.com> wrote:
>
> I have 2 spark jobs one is pre-process and
I have 2 spark jobs one is pre-process and the second is the process.
Process job needs to calculate for each user in the data.
I want to avoid shuffle like groupBy so I think about to save the result
of the pre-process as bucket by user in Parquet or to re-partition by user
and save the result .
Hi,
I just wanted to say hi all to the Spark community. I'm developing some
stuff right now using Spark (we've started very recently). As the API
documentation of Spark is really really good, I like to get deeper
knowledge of the internal stuff -you know, the goodies. Watching movies
from Spark