Re: tuning - Spark data serialization for cache() ?

2017-08-08 Thread Kazuaki Ishizaki
, Kazuaki Ishizaki From: Ofir Manor <ofir.ma...@equalum.io> To: Kazuaki Ishizaki <ishiz...@jp.ibm.com> Cc: user <user@spark.apache.org> Date: 2017/08/08 03:12 Subject: Re: tuning - Spark data serialization for cache() ? Thanks a lot for the quick pointer! S

Re: tuning - Spark data serialization for cache() ?

2017-08-07 Thread Ofir Manor
o:user <user@spark.apache.org> > Date:2017/08/08 02:04 > Subject:tuning - Spark data serialization for cache() ? > -- > > > > Hi, > I'm using Spark 2.2, and have a big batch job, using dataframes (with > built-in

Re: tuning - Spark data serialization for cache() ?

2017-08-07 Thread Kazuaki Ishizaki
that these PRs will be integrated into Spark 2.3. Kazuaki Ishizaki From: Ofir Manor <ofir.ma...@equalum.io> To: user <user@spark.apache.org> Date: 2017/08/08 02:04 Subject: tuning - Spark data serialization for cache() ? Hi, I'm using Spark 2.2, and have a big batc

tuning - Spark data serialization for cache() ?

2017-08-07 Thread Ofir Manor
Hi, I'm using Spark 2.2, and have a big batch job, using dataframes (with built-in, basic types). It references the same intermediate dataframe multiple times, so I wanted to try to cache() that and see if it helps, both in memory footprint and performance. Now, the Spark 2.2 tuning page (