Hi, I'm using java over spark for processing 30 GB of data every hour. I'm doing spark-submit in cluster mode. I have a cluster of 11 machines (9 - 64 GB memory and 2 - 32 GB memory ) but it takes 30 mins to process 30 GB of data every hour. How can i optimize this ? How to compute the driver and executor memory according to machine configuration ? I'm using following spark configuration.
sparkConf.setMaster("yarn-cluster"); sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"); sparkConf.set("spark.driver.memory", "2g"); sparkConf.set("spark.executor.memory", "2g"); sparkConf.set("spark.storage.memoryFraction", "0.5"); sparkConf.set("spark.shuffle.memoryFraction", "0.4" ); *Thanks*, <https://in.linkedin.com/in/ramkumarcs31>