Hi,

I am using Spark 1.4 on a cluster (stand-alone mode), across 3 machines,
for a workload similar to TPCH (analytical queries with multiple/multi-way
large joins and aggregations). Each machine has 12GB of Memory and 4 cores.
My total data size is 150GB, stored in HDFS (stored as Hive tables), and I
am running my queries through Spark SQL using hive context.
After checking the performance tuning documents on the spark page and some
clips from latest spark summit, I decided to set the following configs in
my spark-env:

SPARK_WORKER_INSTANCES=4
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=2500M

(As my tasks tend to be long so the overhead of starting multiple JVMs, one
per worker is much less than the total query times). As I monitor the job
progress, I realized that while the Worker memory is 2.5GB, the executors
(one per worker) have max memory of 512MB (which is default). I enlarged
this value in my application as:

conf.set("spark.executor.memory", "2.5g");

Trying to give max available memory on each worker to its only executor,
but I observed that my queries are running slower than the prev case
(default 512MB). Changing 2.5g to 1g improved the performance time, it is
close to but still worse than 512MB case. I guess what I am missing here is
what is the relationship between the "WORKER_MEMORY" and 'executor.memory'.

- Isn't it the case that WORKER tries to split this memory among its
executors (in my case its only executor) ? Or there are other stuff being
done worker which need memory ?

- What other important parameters I need to look into and tune at this
point to get the best response time out of my HW ? (I have read about Kryo
serializer, and I am about trying that - I am mainly concerned about memory
related settings and also knobs related to parallelism of my jobs). As an
example, for a simple scan-only query, Spark is worse than Hive (almost 3
times slower) while both are scanning the exact same table & file format.
That is why I believe I am missing some params by leaving them as defaults.

Any hint/suggestion would be highly appreciated.

Reply via email to