Hi all, I am fairly new to spark and wonder if you can help me. I am exploring GraphX/Spark by running the pagerank example on a medium size graph (12 GB) using this command:
My cluster is 1+16 machines, the master has 15 GB memory and each worker has 30 GB. The master has 2 cores and each worker has 4 cores. /home/ubuntu/spark-1.3.0/bin/spark-submit --master spark://<Master IP>:7077 --class org.apache.spark.examples.graphx.Analytics /home/ubuntu/spark-1.3.0/examples/target/scala-2.10/spark-examples-1.3.0-hadoop1.0.4.jar pagerank /user/ubuntu/input/<dataset> --numEPart=64 --output=/user/ubuntu/spark/16_pagerank --numIter=30 I have two questions: 1- When I set "SPARK_EXECUTOR_MEMORY=25000M", I received errors because master cannot allocate this memory since the launched task includes "-Xms 25000M". Based on my understanding, the master does not do any computation and this executor memory is only required in the worker machines. Why the application cannot start without allocating all required memory in the master as well as in all workers. ! 2- I changed the executor memory to 15 GB and the application worked fine. However, it did not finish the thirty iterations after 7 hours. There is one that was taking 4+ hours, and its input is 400+ GB. I must be doing something wrong, any comment? -- Thanks, -Khaled Ammar www.khaledammar.com