Re: spark-defaults.conf optimal configuration
Hello Neelesh, Thank you for the checklist for determining the correct configuration of Spark. I will go through these and let you know if I have further questions. Regards, Chris -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-defaults-conf-optimal-configuration-tp25641p25649.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
spark-defaults.conf optimal configuration
I am seeking help with a Spark configuration running queries against a cluster of 6 machines. Each machine has Spark 1.5.1 with slaves started on 6 and 1 acting as master/thriftserver. I query from Beeline 2 tables that have 300M and 31M rows respectively. Results from my queries thus far return up to 500M rows when queried using Oracle but Spark errors at anything more than 5.5M rows. I believe there is an optimal memory configuration that must be set for each of the workers in our cluster but I have not been able to determine that setting. Is there something better than trial and error? Are there settings to avoid such as making sure not to set spark.driver.maxResultSize > spark.driver.memory? Is there a formula or guidelines by which to calculate the correct Spark configuration values when given a machines available cores and memory resources? This is my current configuration: BDA v3 server : SUN SERVER X4-2L Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz CPU cores : 32 GB of memory (>=63): 63 number of disks : 12spark-defaults.conf spark.driver.memory 20g spark.executor.memory 40g spark.executor.extraJavaOptions -XX:+PrintGCDetails -XX:+PrintGCTimeStamps spark.rpc.askTimeout6000s spark.rpc.lookupTimeout3000s spark.driver.maxResultSize20g spark.rdd.compress true spark.storage.memoryFraction1 spark.core.connection.ack.wait.timeout 600 spark.akka.frameSize500 spark.shuffle.compress true spark.shuffle.file.buffer 128k spark.shuffle.memoryFraction0 spark.shuffle.spill.compress true spark.shuffle.spill true Thank you, Chris -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-defaults-conf-optimal-configuration-tp25641.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org