Re: lost executor due to large shuffle spill memory

Michael Slavitch Tue, 05 Apr 2016 18:03:40 -0700

Do you have enough disk space for the spill?  It seems it has lots of memory 
reserved but not enough for the spill. You will need a disk that can handle the 
entire data partition for each host. Compression of the spilled data saves 
about 50% in most if not all cases.


Given the large data set I would consider a 1TB SATA flash drive, formatted as 
EXT4 or XFS  and give it exclusive access as spark.local.dir.  It will slow 
things down but it won’t stop.  There are alternatives if you want to discuss 
offline.


> On Apr 5, 2016, at 6:37 PM, lllll <lishu...@gmail.com> wrote:
> 
> I have a task to remap the index to actual uuid in ALS prediction results.
> But it consistently fail due to lost executors. I noticed there's large
> shuffle spill memory but I don't know how to improve it. 
> 
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png> 
> 
> I've tried to reduce the number of executors while assigning each to have
> bigger memory. 
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png> 
> 
> But it still doesn't seem big enough. I don't know what to do. 
> 
> Below is my code:
> user = load_user()
> product = load_product()
> user.cache()
> product.cache()
> model = load_model(model_path)
> all_pairs = user.map(lambda x: x[1]).cartesian(product.map(lambda x: x[1]))
> all_prediction = model.predictAll(all_pairs)
> user_reverse = user.map(lambda r: (r[1], r[0]))
> product_reverse = product.map(lambda r: (r[1], r[0]))
> user_reversed = all_prediction.map(lambda u: (u[0], (u[1],
> u[2]))).join(user_reverse).map(lambda r: (r[1][0][0], (r[1][1],
> r[1][0][1])))
> both_reversed = user_reversed.join(product_reverse).map(lambda r:
> (r[1][0][0], r[1][1], r[1][0][1]))
> both_reversed.map(lambda x: '{}|{}|{}'.format(x[0], x[1],
> x[2])).saveAsTextFile(recommendation_path)
> 
> Both user and products are (uuid, index) tuples. 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: lost executor due to large shuffle spill memory

Reply via email to