The laziness is hard to deal with in these situations. I would suggest trying to handle expected cases "FileNotFound", etc using other methods before even starting a Spark job. If you really want to try.catch a specific portion of a Spark job, one way is to just follow it with an action. You can even call persist() before the action, so that you can re-use the rdd.
Best, Burak On Mon, Aug 24, 2015 at 10:52 AM, Roberto Coluccio < roberto.coluc...@gmail.com> wrote: > Hi Burak, thanks for your answer. > > I have a "new MyResultFunction()(sparkContext, inputPath).collect" in the > unit test (so to evaluate the actual result), and there I can observe and > catch the exception. Even considering Spark's laziness, shouldn't I catch > the exception while occurring in the try..catch statement that encloses the > textFile invocation? > > Best, > Roberto > > > On Mon, Aug 24, 2015 at 7:38 PM, Burak Yavuz <brk...@gmail.com> wrote: > >> textFile is a lazy operation. It doesn't evaluate until you call an >> action on it, such as .count(). Therefore, you won't catch the exception >> there. >> >> Best, >> Burak >> >> On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio < >> roberto.coluc...@gmail.com> wrote: >> >>> Hello folks, >>> >>> I'm experiencing an unexpected behaviour, that suggests me thinking >>> about my missing notions on how Spark works. Let's say I have a Spark >>> driver that invokes a function like: >>> >>> ----- in myDriver ----- >>> >>> val sparkContext = new SparkContext(mySparkConf) >>> val inputPath = "file://home/myUser/project/resources/date=*/*" >>> >>> val myResult = new MyResultFunction()(sparkContext, inputPath) >>> >>> ----- in MyResultFunctionOverRDD ------ >>> >>> class MyResultFunction extends Function2[SparkContext, String, >>> RDD[String]] with Serializable { >>> override def apply(sparkContext: SparkContext, inputPath: String): >>> RDD[String] = { >>> try { >>> sparkContext.textFile(inputPath, 1) >>> } catch { >>> case t: Throwable => { >>> myLogger.error(s"error: ${t.getStackTraceString}\n") >>> sc.makeRDD(Seq[String]()) >>> } >>> } >>> } >>> } >>> >>> What happens is that I'm *unable to catch exceptions* thrown by the >>> "textFile" method within the try..catch clause in MyResultFunction. In >>> fact, in a unit test for that function where I call it passing an invalid >>> "inputPath", I don't get an empty RDD as result, but the unit test exits >>> (and fails) due to exception not handled. >>> >>> What am I missing here? >>> >>> Thank you. >>> >>> Best regards, >>> Roberto >>> >> >> >