Re: Spark Streaming Shuffle to Disk

2015-12-10 Thread manasdebashiskar
how often do you checkpoint?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Shuffle-to-Disk-tp25567p25682.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming Shuffle to Disk

2015-12-07 Thread Akhil Das
UpdateStateByKey and your batch data could be filling up your executor
memory and hence it might be hitting the disk, you can verify it by looking
at the memory footprint while your job is running. Looking at the executor
logs will also give you a better understanding of whats going on.

Thanks
Best Regards

On Fri, Dec 4, 2015 at 8:24 AM, Steven Pearson  wrote:

> I'm running a Spark Streaming job on 1.3.1 which contains an
> updateStateByKey.  The job works perfectly fine, but at some point (after a
> few runs), it starts shuffling to disk no matter how much memory I give the
> executors.
>
> I have tried changing --executor-memory on
> spark-submit, spark.shuffle.memoryFraction, spark.storage.memoryFraction,
> and spark.storage.unrollFraction.  But no matter how I configure these, it
> always spills to disk around 2.5GB.
>
> What is the best way to avoid spilling shuffle to disk?
>
>


Spark Streaming Shuffle to Disk

2015-12-04 Thread spearson23
I'm running a Spark Streaming job on 1.3.1 which contains an
updateStateByKey.  The job works perfectly fine, but at some point (after a
few runs), it starts shuffling to disk no matter how much memory I give the
executors.

I have tried changing --executor-memory on spark-submit,
spark.shuffle.memoryFraction, spark.storage.memoryFraction, and
spark.storage.unrollFraction.  But no matter how I configure these, it
always spills to disk around 2.5GB.  

What is the best way to avoid spilling shuffle to disk?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Shuffle-to-Disk-tp25567.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Streaming Shuffle to Disk

2015-12-03 Thread Steven Pearson
I'm running a Spark Streaming job on 1.3.1 which contains an
updateStateByKey.  The job works perfectly fine, but at some point (after a
few runs), it starts shuffling to disk no matter how much memory I give the
executors.

I have tried changing --executor-memory on
spark-submit, spark.shuffle.memoryFraction, spark.storage.memoryFraction,
and spark.storage.unrollFraction.  But no matter how I configure these, it
always spills to disk around 2.5GB.

What is the best way to avoid spilling shuffle to disk?