Re: Dealing with failures

Mohit Anchlia Wed, 08 Jun 2016 09:33:44 -0700

On Wed, Jun 8, 2016 at 3:42 AM, Jacek Laskowski <ja...@japila.pl> wrote:


> On Wed, Jun 8, 2016 at 2:38 AM, Mohit Anchlia <mohitanch...@gmail.com>
> wrote:
> > I am looking to write an ETL job using spark that reads data from the
> > source, perform transformation and insert it into the destination.
>
> Is this going to be one-time job or you want it to run every time interval?
>
> > 1. Source becomes slow or un-responsive. How to control such a situation
> so
> > that it doesn't cause DDoS on the source?
>
> Why do you think Spark would DDoS the source? I'm reading it as if
> Spark tried to open a new connection after the currently-open one
> became slow. I don't think it's how Spark does connections. What is
> the source in your use case?
>

>> I was primarily concerned about retires storms causing DDoS on the
source. How does spark deal with a scenario where it gets timeout from the
source. Does it retry or does it fail? And if the task fail does it fail
the job. And is it possible to restart the job and only process the failed
tasks and the remaining pending tasks? My use case is reading from
Cassandra performing some transformation and saving the data to a different
Cassandra cluster. I want to make sure that the data is reliably copied
without missing data. At the same time also want to make sure that the
process doesn't cause performance impact to other live production traffic
to these clusters when there are failures eg: DDoS or retry storms.


> > Also, at the same time how to make it resilient that it does pick up
> from where it left?
>
> It sounds like checkpointing. It's available in Core and Streaming.
> So, what's your source and how often do you want to query for data?
> You may also benefit from the recent additions to Spark in 2.0 called
> Structured Streaming (aka Streaming Datasets) - see
> https://issues.apache.org/jira/browse/SPARK-8360.
>
>
>> Does checkpointing help with the failure scenario that I described
above? I read checkpointing as a way to restart processing of data if tasks
fail because of spark cluster issues. Does it also work in the scenario
that I described?


> > 2. In the same context when destination becomes slow or un-responsive.
>
> What is a destination? It appears as if you were doing streaming and
> want to use checkpointing and back-pressure. But you haven't said much
> about your use case to be precise.
>
> Jacek
>

Re: Dealing with failures

Reply via email to