Can you tell which nodes were doing the computation in each case?

Date: Wed, 27 Aug 2014 20:29:38 +0530
Subject: Execution time increasing with increase of cluster size
From: sarathchandra.jos...@algofusiontech.com
To: user@spark.apache.org

Hi,
I've written a simple scala program which reads a file on HDFS (which is a 
delimited file having 100 fields and 1 million rows), splits each row with 
delimiter, deduces hashcode of each field, makes new rows with these hashcodes 
and writes these rows back to HDFS. Code attached.

When I run this on spark cluster of 2 nodes (these 2 nodes also act as HDFS 
cluster) it took about 35sec to complete. Then I increased the cluster to 4 
nodes (additional nodes are not part of HDFS cluster) and submitted the same 
job. I was expecting a decrease in the execution time but instead it took 3 
times more time (1.6 min) to complete. Attached snapshots of the execution 
summary.

Both the times I've set executor memory to 6GB which is available in all the 
nodes.
What am I'm missing here? Do I need to do any additional configuration when 
increasing the cluster size?

~Sarath


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org                     
                  

Reply via email to