Whenever spark read the data from it will have it in executor memory until
and unless there is no room for new data read or processed. This is the
beauty of spark.
On Tue, Apr 3, 2018 at 12:42 AM snjv wrote:
> Hi,
>
> When we execute the same operation twice, spark
Hi,
When we execute the same operation twice, spark takes less time ( ~40%) than
the first.
Our operation is like this:
Read 150M rows ( spread in multiple parquet files) into DF
Read 10M rows ( spread in multiple parquet files) into other DF.
Do an intersect operation.
Size of 150M row file: