Re: Implementing fail-fast upon critical spark streaming tasks errors

2015-12-07 Thread Cody Koeninger
Personally, for jobs that I care about I store offsets in transactional
storage rather than checkpoints, which eliminates that problem (just
enforce whatever constraints you want when storing offsets).

Regarding the question of communication of errors back to the
streamingListener, there is an onReceiverError callback.  Direct stream
isn't a receiver, and I'm not sure it'd be appropriate to try to change the
direct stream to use that as a means of communication.  Maybe TD can chime
in if you get his attention.


On Sun, Dec 6, 2015 at 9:11 AM, yam <yoel.am...@playtech.com> wrote:

> When a spark streaming task is failed (after exceeding
> spark.task.maxFailures), the related batch job is considered failed and the
> driver continues to the next batch in the pipeline after updating
> checkpoint
> to the next checkpoint positions (the new offsets when using Kafka direct
> streaming).
>
> I'm looking for a fail-fast implementation where one or more (or all) tasks
> are failed (a critical error case happens and there's no point in
> progressing to the next batch in line and also un-processed data should not
> be lost) - identify this condition and exit application before committing
> updated checkpoint.
>
> Any ideas? or more specifically:
>
> 1. Is there a way for the driver program to get notified upon a job failure
> (which tasks have failed / rdd metadata) before updating checkpoint.
> StreamingContext has a addStreamingListener method but its onBatchCompleted
> event has no indication on batch failure (only completion).
>
> 2. Stopping and exiting the application. when running in yarn-client mode,
> calling streamingContext.stop halts the process and does not exit.
>
> Thanks!
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-fail-fast-upon-critical-spark-streaming-tasks-errors-tp25606.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Implementing fail-fast upon critical spark streaming tasks errors

2015-12-06 Thread yam
When a spark streaming task is failed (after exceeding
spark.task.maxFailures), the related batch job is considered failed and the
driver continues to the next batch in the pipeline after updating checkpoint
to the next checkpoint positions (the new offsets when using Kafka direct
streaming).

I'm looking for a fail-fast implementation where one or more (or all) tasks
are failed (a critical error case happens and there's no point in
progressing to the next batch in line and also un-processed data should not
be lost) - identify this condition and exit application before committing
updated checkpoint.

Any ideas? or more specifically:

1. Is there a way for the driver program to get notified upon a job failure
(which tasks have failed / rdd metadata) before updating checkpoint.
StreamingContext has a addStreamingListener method but its onBatchCompleted
event has no indication on batch failure (only completion). 

2. Stopping and exiting the application. when running in yarn-client mode,
calling streamingContext.stop halts the process and does not exit.

Thanks!





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-fail-fast-upon-critical-spark-streaming-tasks-errors-tp25606.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org