I created a JIRA: https://issues.apache.org/jira/browse/SPARK-6353
On Mon, Mar 16, 2015 at 5:36 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Hi, > > We're facing "No space left on device" errors lately from time to time. > The job will fail after retries. Obvious in such case, retry won't be > helpful. > > Sure it's the problem in the datanodes but I'm wondering if Spark Driver > can handle it and decommission the problematic datanode before retrying it. > And maybe dynamically allocate another datanode if dynamic allocation is > enabled. > > I think there needs to be a class of fatal errors that can't be recovered > with retries. And it's best Spark can handle it nicely. > > Thanks, > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/