Re: tuning - Spark data serialization for cache() ?

2017-08-07 Thread Kazuaki Ishizaki
, Kazuaki Ishizaki From: Ofir Manor To: Kazuaki Ishizaki Cc: user Date: 2017/08/08 03:12 Subject:Re: tuning - Spark data serialization for cache() ? Thanks a lot for the quick pointer! So, is the advice I linked to in official Spark 2.2 documentation misleading? You are

Re: tuning - Spark data serialization for cache() ?

2017-08-07 Thread Ofir Manor
re > working for alleviating these issues in https://issues.apache.org/ > jira/browse/SPARK-14098. > We expect that these PRs will be integrated into Spark 2.3. > > Kazuaki Ishizaki > > > > From:Ofir Manor > To:user > Date: 2017/08/08 02:04 > S

Re: tuning - Spark data serialization for cache() ?

2017-08-07 Thread Kazuaki Ishizaki
these PRs will be integrated into Spark 2.3. Kazuaki Ishizaki From: Ofir Manor To: user Date: 2017/08/08 02:04 Subject:tuning - Spark data serialization for cache() ? Hi, I'm using Spark 2.2, and have a big batch job, using dataframes (with built-in, basic types

tuning - Spark data serialization for cache() ?

2017-08-07 Thread Ofir Manor
Hi, I'm using Spark 2.2, and have a big batch job, using dataframes (with built-in, basic types). It references the same intermediate dataframe multiple times, so I wanted to try to cache() that and see if it helps, both in memory footprint and performance. Now, the Spark 2.2 tuning page ( http://