Re: spark-defaults.conf optimal configuration

2015-12-09 Thread cjrumble
Hello Neelesh,

Thank you for the checklist for determining the correct configuration of
Spark. I will go through these and let you know if I have further questions. 

Regards,

Chris 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-defaults-conf-optimal-configuration-tp25641p25649.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark-defaults.conf optimal configuration

2015-12-08 Thread cjrumble
I am seeking help with a Spark configuration running queries against a
cluster of 6 machines. Each machine has Spark 1.5.1 with slaves started on 6
and 1 acting as master/thriftserver. I query from Beeline 2 tables that have
300M and 31M rows respectively. Results from my queries thus far return up
to 500M rows when queried using Oracle but Spark errors at anything more
than 5.5M rows. 

I believe there is an optimal memory configuration that must be set for each
of the workers in our cluster but I have not been able to determine that
setting. Is there something better than trial and error? Are there settings
to avoid such as making sure not to set spark.driver.maxResultSize >
spark.driver.memory?

Is there a formula or guidelines by which to calculate the correct Spark
configuration values when given a machines available cores and memory
resources? 

This is my current configuration:
BDA v3 server : SUN SERVER X4-2L
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
CPU cores : 32
GB of memory (>=63): 63
number of disks : 12spark-defaults.conf

spark.driver.memory 20g
spark.executor.memory 40g
spark.executor.extraJavaOptions -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
spark.rpc.askTimeout6000s
spark.rpc.lookupTimeout3000s
spark.driver.maxResultSize20g
spark.rdd.compress   true
spark.storage.memoryFraction1
spark.core.connection.ack.wait.timeout 600
spark.akka.frameSize500
spark.shuffle.compress  true
spark.shuffle.file.buffer 128k
spark.shuffle.memoryFraction0
spark.shuffle.spill.compress   true
spark.shuffle.spill true

Thank you,

Chris



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-defaults-conf-optimal-configuration-tp25641.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org