Hi,

I'm using java over spark for processing 30 GB of data every hour. I'm
doing spark-submit in cluster mode. I have a cluster of 11 machines (9 - 64
GB memory and 2 - 32 GB memory ) but it takes 30 mins to process 30 GB of
data every hour. How can i optimize this ? How to compute the driver and
executor memory according to machine configuration ? I'm using following
spark configuration.

 sparkConf.setMaster("yarn-cluster");
 sparkConf.set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer");
 sparkConf.set("spark.driver.memory", "2g");
 sparkConf.set("spark.executor.memory", "2g");
 sparkConf.set("spark.storage.memoryFraction", "0.5");
 sparkConf.set("spark.shuffle.memoryFraction", "0.4" );

*Thanks*,
<https://in.linkedin.com/in/ramkumarcs31>

Reply via email to