Question about master memory requirement and GraphX pagerank performance !

Khaled Ammar Tue, 07 Jul 2015 05:42:55 -0700

Hi all,

I am fairly new to spark and wonder if you can help me. I am exploring
GraphX/Spark by running the pagerank example on a medium size graph (12 GB)
using this command:


My cluster is 1+16 machines, the master has 15 GB memory and each worker
has 30 GB. The master has 2 cores and each worker has 4 cores.

/home/ubuntu/spark-1.3.0/bin/spark-submit --master spark://<Master IP>:7077
--class org.apache.spark.examples.graphx.Analytics
/home/ubuntu/spark-1.3.0/examples/target/scala-2.10/spark-examples-1.3.0-hadoop1.0.4.jar
pagerank /user/ubuntu/input/<dataset> --numEPart=64
--output=/user/ubuntu/spark/16_pagerank --numIter=30


I have two questions:

1- When I set "SPARK_EXECUTOR_MEMORY=25000M", I received errors because
master cannot allocate this memory since the launched task includes "-Xms
25000M". Based on my understanding, the master does not do any computation
and this executor memory is only required in the worker machines. Why the
application cannot start without allocating all required memory in the
master as well as in all workers. !

2- I changed the executor memory to 15 GB and the application worked fine.
However, it did not finish the thirty iterations after 7 hours. There is
one that was taking 4+ hours, and its input is 400+ GB. I must be doing
something wrong, any comment?

-- 
Thanks,
-Khaled Ammar
www.khaledammar.com

Question about master memory requirement and GraphX pagerank performance !

Reply via email to