The laziness is hard to deal with in these situations. I would suggest
trying to handle expected cases "FileNotFound", etc using other methods
before even starting a Spark job. If you really want to try.catch a
specific portion of a Spark job, one way is to just follow it with an
action. You can even call persist() before the action, so that you can
re-use the rdd.

Best,
Burak

On Mon, Aug 24, 2015 at 10:52 AM, Roberto Coluccio <
roberto.coluc...@gmail.com> wrote:

> Hi Burak, thanks for your answer.
>
> I have a "new MyResultFunction()(sparkContext, inputPath).collect" in the
> unit test (so to evaluate the actual result), and there I can observe and
> catch the exception. Even considering Spark's laziness, shouldn't I catch
> the exception while occurring in the try..catch statement that encloses the
> textFile invocation?
>
> Best,
> Roberto
>
>
> On Mon, Aug 24, 2015 at 7:38 PM, Burak Yavuz <brk...@gmail.com> wrote:
>
>> textFile is a lazy operation. It doesn't evaluate until you call an
>> action on it, such as .count(). Therefore, you won't catch the exception
>> there.
>>
>> Best,
>> Burak
>>
>> On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio <
>> roberto.coluc...@gmail.com> wrote:
>>
>>> Hello folks,
>>>
>>> I'm experiencing an unexpected behaviour, that suggests me thinking
>>> about my missing notions on how Spark works. Let's say I have a Spark
>>> driver that invokes a function like:
>>>
>>> ----- in myDriver -----
>>>
>>> val sparkContext = new SparkContext(mySparkConf)
>>> val inputPath = "file://home/myUser/project/resources/date=*/*"
>>>
>>> val myResult = new MyResultFunction()(sparkContext, inputPath)
>>>
>>> ----- in MyResultFunctionOverRDD ------
>>>
>>> class MyResultFunction extends Function2[SparkContext, String,
>>> RDD[String]] with Serializable {
>>>   override def apply(sparkContext: SparkContext, inputPath: String):
>>> RDD[String] = {
>>>     try {
>>>       sparkContext.textFile(inputPath, 1)
>>>     } catch {
>>>       case t: Throwable => {
>>>         myLogger.error(s"error: ${t.getStackTraceString}\n")
>>>         sc.makeRDD(Seq[String]())
>>>       }
>>>     }
>>>   }
>>> }
>>>
>>> What happens is that I'm *unable to catch exceptions* thrown by the
>>> "textFile" method within the try..catch clause in MyResultFunction. In
>>> fact, in a unit test for that function where I call it passing an invalid
>>> "inputPath", I don't get an empty RDD as result, but the unit test exits
>>> (and fails) due to exception not handled.
>>>
>>> What am I missing here?
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Roberto
>>>
>>
>>
>

Reply via email to