Spark Streaming Shuffle to Disk

2015-12-04 Thread spearson23
I'm running a Spark Streaming job on 1.3.1 which contains an
updateStateByKey.  The job works perfectly fine, but at some point (after a
few runs), it starts shuffling to disk no matter how much memory I give the
executors.

I have tried changing --executor-memory on spark-submit,
spark.shuffle.memoryFraction, spark.storage.memoryFraction, and
spark.storage.unrollFraction.  But no matter how I configure these, it
always spills to disk around 2.5GB.  

What is the best way to avoid spilling shuffle to disk?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Shuffle-to-Disk-tp25567.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: JMXSink for YARN deployment

2015-12-04 Thread spearson23
We use a metrics.property file on YARN by submitting applications like this:

spark-submit --conf spark.metrics.conf=metrics.properties --class CLASS_NAME
--master yarn-cluster --files /PATH/TO/metrics.properties /PATH/TO/CODE.JAR
/PATH/TO/CONFIG.FILE APP_NAME




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/JMXSink-for-YARN-deployment-tp13958p25570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: JMXSink for YARN deployment

2015-12-04 Thread spearson23
Run "spark-submit --help" to see all available options.

To get JMX to work you need to:

spark-submit --driver-java-options "-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=JMX_PORT" --conf
spark.metrics.conf=metrics.properties --class 'CLASS_NAME' --master
yarn-cluster --files /PATH/TO/metrics.properties /PATH/TO/JAR.FILE


This will run JMX on the driver node on or "JMX_PORT".  Note that the driver
node and the YARN master node are not the same, you'll have to look where
spark put the driver node and then connect there.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/JMXSink-for-YARN-deployment-tp13958p25572.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org