Hi all,

I wonder if any one has an explanation for this behavior.

Thank you,
-Khaled

---------- Forwarded message ----------
From: Khaled Ammar <khaled.am...@gmail.com>
Date: Fri, Jul 24, 2015 at 9:35 AM
Subject: Performance questions regarding Spark 1.3 standalone mode
To: user@spark.apache.org


Hi all,

I have a standalone spark cluster setup on EC2 machines. I did the setup
manually without the ec2 scripts. I have two questions about Spark/GraphX
performance:

1) When I run the PageRank example, the storage tab does not show that all
RDDs are cached. Only one RDD is 100% cached, but the remaining range from
25% to 97%. Kindly note there is enough memory to cache all RDDs.

2) I noticed that loading the dataset partitions, total of 25 GB, is not
always evenly distributed to executors. Occasionally, one or two executor
become responsible for loading several partitions, while others are loading
only 1 partition. Does any one know the reason behind this behavior? Is it
a bug, or it is possible to fix this using configuration parameters.

-- 
Thanks,
-Khaled



-- 
Thanks,
-Khaled

Reply via email to