stopping a process usgin an RDD

2016-01-04 Thread domibd
Hello,

Is there a way to stop under a condition a process (like map-reduce) using 
an RDD ?

(this could be use if the process does not always need to
 explore all the RDD)

thanks

Dominique





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/stopping-a-process-usgin-an-RDD-tp25870.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: stopping a process usgin an RDD

2016-01-04 Thread Michael Segel
Not really a good idea. 

It breaks the paradigm. 

If I understand the OP’s idea… they want to halt processing the RDD, but not 
the entire job. 
So when it hits a certain condition, it will stop that task yet continue on to 
the next RDD. (Assuming you have more RDDs or partitions than you have task 
’slots’)  So if you fail enough RDDs, your job fails meaning you don’t get any 
results. 

The best you could do is a NOOP.  That is… if your condition is met on that 
RDD, your M/R job will not output anything to the collection so no more data is 
being added to the result set. 

The whole paradigm is to process the entire RDD at the time. 

You may spin cycles, but that’s not a really bad thing. 

HTH

-Mike

> On Jan 4, 2016, at 6:45 AM, Daniel Darabos <daniel.dara...@lynxanalytics.com> 
> wrote:
> 
> You can cause a failure by throwing an exception in the code running on the 
> executors. The task will be retried (if spark.task.maxFailures > 1), and then 
> the stage is failed. No further tasks are processed after that, and an 
> exception is thrown on the driver. You could catch the exception and see if 
> it was caused by your own special exception.
> 
> On Mon, Jan 4, 2016 at 1:05 PM, domibd <d...@lipn.univ-paris13.fr 
> <mailto:d...@lipn.univ-paris13.fr>> wrote:
> Hello,
> 
> Is there a way to stop under a condition a process (like map-reduce) using
> an RDD ?
> 
> (this could be use if the process does not always need to
>  explore all the RDD)
> 
> thanks
> 
> Dominique
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/stopping-a-process-usgin-an-RDD-tp25870.html
>  
> <http://apache-spark-user-list.1001560.n3.nabble.com/stopping-a-process-usgin-an-RDD-tp25870.html>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
> 



Re: stopping a process usgin an RDD

2016-01-04 Thread Daniel Darabos
You can cause a failure by throwing an exception in the code running on the
executors. The task will be retried (if spark.task.maxFailures > 1), and
then the stage is failed. No further tasks are processed after that, and an
exception is thrown on the driver. You could catch the exception and see if
it was caused by your own special exception.

On Mon, Jan 4, 2016 at 1:05 PM, domibd <d...@lipn.univ-paris13.fr> wrote:

> Hello,
>
> Is there a way to stop under a condition a process (like map-reduce) using
> an RDD ?
>
> (this could be use if the process does not always need to
>  explore all the RDD)
>
> thanks
>
> Dominique
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/stopping-a-process-usgin-an-RDD-tp25870.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>