Hi all, I wonder if any one has an explanation for this behavior.
Thank you, -Khaled ---------- Forwarded message ---------- From: Khaled Ammar <khaled.am...@gmail.com> Date: Fri, Jul 24, 2015 at 9:35 AM Subject: Performance questions regarding Spark 1.3 standalone mode To: user@spark.apache.org Hi all, I have a standalone spark cluster setup on EC2 machines. I did the setup manually without the ec2 scripts. I have two questions about Spark/GraphX performance: 1) When I run the PageRank example, the storage tab does not show that all RDDs are cached. Only one RDD is 100% cached, but the remaining range from 25% to 97%. Kindly note there is enough memory to cache all RDDs. 2) I noticed that loading the dataset partitions, total of 25 GB, is not always evenly distributed to executors. Occasionally, one or two executor become responsible for loading several partitions, while others are loading only 1 partition. Does any one know the reason behind this behavior? Is it a bug, or it is possible to fix this using configuration parameters. -- Thanks, -Khaled -- Thanks, -Khaled