Spark Streaming Shuffle to Disk
I'm running a Spark Streaming job on 1.3.1 which contains an updateStateByKey. The job works perfectly fine, but at some point (after a few runs), it starts shuffling to disk no matter how much memory I give the executors. I have tried changing --executor-memory on spark-submit, spark.shuffle.memoryFraction, spark.storage.memoryFraction, and spark.storage.unrollFraction. But no matter how I configure these, it always spills to disk around 2.5GB. What is the best way to avoid spilling shuffle to disk? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Shuffle-to-Disk-tp25567.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: JMXSink for YARN deployment
We use a metrics.property file on YARN by submitting applications like this: spark-submit --conf spark.metrics.conf=metrics.properties --class CLASS_NAME --master yarn-cluster --files /PATH/TO/metrics.properties /PATH/TO/CODE.JAR /PATH/TO/CONFIG.FILE APP_NAME -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JMXSink-for-YARN-deployment-tp13958p25570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: JMXSink for YARN deployment
Run "spark-submit --help" to see all available options. To get JMX to work you need to: spark-submit --driver-java-options "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=JMX_PORT" --conf spark.metrics.conf=metrics.properties --class 'CLASS_NAME' --master yarn-cluster --files /PATH/TO/metrics.properties /PATH/TO/JAR.FILE This will run JMX on the driver node on or "JMX_PORT". Note that the driver node and the YARN master node are not the same, you'll have to look where spark put the driver node and then connect there. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JMXSink-for-YARN-deployment-tp13958p25572.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org