subject:"SparkSQL\: First query execution is always slower than subsequent queries"

Re: SparkSQL: First query execution is always slower than subsequent queries

2015-10-07 Thread Michael Armbrust

-dev +user 1). Is that the reason why it's always slow in the first run? Or are there > any other reasons? Apparently it loads data to memory every time so it > shouldn't be something to do with disk read should it? > You are probably seeing the effect of the JVMs JIT. The first run is

SparkSQL: First query execution is always slower than subsequent queries

2015-10-07 Thread Lloyd Haris

Hi Spark Devs, I am doing a performance evaluation of Spark using pyspark. I am using Spark 1.5 with a Hadoop 2.6 cluster of 4 nodes and ran these tests on local mode. After a few dozen test executions, it turned out that the very first SparkSQL query execution is always slower than the