[ https://issues.apache.org/jira/browse/SPARK-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-2723. --------------------------------- Resolution: Incomplete > Block Manager should catch exceptions in putValues > -------------------------------------------------- > > Key: SPARK-2723 > URL: https://issues.apache.org/jira/browse/SPARK-2723 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.0 > Reporter: Shivaram Venkataraman > Priority: Major > Labels: bulk-closed > > The BlockManager should catch exceptions encountered while writing out files > to disk. Right now these exceptions get counted as user-level task failures > and the job is aborted after failing 4 times. We should either fail the > executor or handle this better to prevent the job from dying. > I ran into an issue where one disk on a large EC2 cluster failed and this > resulted in a long running job terminating. Longer term, we should also look > at black-listing local directories when one of them become unusable ? > Exception pasted below: > 14/07/29 00:55:39 WARN scheduler.TaskSetManager: Loss was due to > java.io.FileNotFoundException > java.io.FileNotFoundException: > /mnt2/spark/spark-local-20140728175256-e7cb/28/broadcast_264_piece20 > (Input/output error) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:221) > at java.io.FileOutputStream.<init>(FileOutputStream.java:171) > at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:79) > at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:66) > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:847) > at > org.apache.spark.storage.MemoryStore$$anonfun$ensureFreeSpace$4.apply(MemoryStore.scala:267) > at > org.apache.spark.storage.MemoryStore$$anonfun$ensureFreeSpace$4.apply(MemoryStore.scala:256) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.storage.MemoryStore.ensureFreeSpace(MemoryStore.scala:256) > at > org.apache.spark.storage.MemoryStore.tryToPut(MemoryStore.scala:179) > at > org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:76) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:663) > at org.apache.spark.storage.BlockManager.put(BlockManager.scala:574) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:108) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org