Re: Spark Streaming Shuffle to Disk

2015-12-10 Thread manasdebashiskar
how often do you checkpoint? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Shuffle-to-Disk-tp25567p25682.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark Streaming Shuffle to Disk

2015-12-07 Thread Akhil Das
UpdateStateByKey and your batch data could be filling up your executor memory and hence it might be hitting the disk, you can verify it by looking at the memory footprint while your job is running. Looking at the executor logs will also give you a better understanding of whats going on. Thanks

Spark Streaming Shuffle to Disk

2015-12-04 Thread spearson23
.nabble.com/Spark-Streaming-Shuffle-to-Disk-tp25567.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Spark Streaming Shuffle to Disk

2015-12-03 Thread Steven Pearson
I'm running a Spark Streaming job on 1.3.1 which contains an updateStateByKey. The job works perfectly fine, but at some point (after a few runs), it starts shuffling to disk no matter how much memory I give the executors. I have tried changing --executor-memory on spark-submit,