subject:"\[Spark sql\]\: Re\-execution of same operation takes less time than 1st"

Re: [Spark sql]: Re-execution of same operation takes less time than 1st

2018-04-03 Thread naresh Goud

Whenever spark read the data from it will have it in executor memory until and unless there is no room for new data read or processed. This is the beauty of spark. On Tue, Apr 3, 2018 at 12:42 AM snjv wrote: > Hi, > > When we execute the same operation twice, spark

[Spark sql]: Re-execution of same operation takes less time than 1st

2018-04-02 Thread snjv

Hi, When we execute the same operation twice, spark takes less time ( ~40%) than the first. Our operation is like this: Read 150M rows ( spread in multiple parquet files) into DF Read 10M rows ( spread in multiple parquet files) into other DF. Do an intersect operation. Size of 150M row file: