Re: [Kubernetes] structured-streaming driver restarts / roadmap

2018-04-30 Thread Oz Ben Ami
This would be useful to us, so I've created a JIRA ticket for this
discussion: https://issues.apache.org/jira/browse/SPARK-24122

On Wed, Mar 28, 2018 at 10:28 AM, Anirudh Ramanathan <
ramanath...@google.com.invalid> wrote:

> We discussed this early on in our fork and I think we should have this in
> a JIRA and discuss it further. It's something we want to address in the
> future.
>
> One proposed method is using a StatefulSet of size 1 for the driver. This
> ensures recovery but at the same time takes away from the completion
> semantics of a single pod.
>
>
> See history in https://github.com/apache-spark-on-k8s/spark/issues/288
>
> On Wed, Mar 28, 2018, 6:56 AM Lucas Kacher  wrote:
>
>> A carry-over from the apache-spark-on-k8s project, it would be useful to
>> have a configurable restart policy for submitted jobs with the Kubernetes
>> resource manager. See the following issues:
>>
>> https://github.com/apache-spark-on-k8s/spark/issues/133
>> https://github.com/apache-spark-on-k8s/spark/issues/288
>> https://github.com/apache-spark-on-k8s/spark/issues/546
>>
>> Use case: I have a structured streaming job that reads from Kafka,
>> aggregates, and writes back out to Kafka deployed via k8s and checkpointing
>> to a remote location. If the driver pod dies for a any number of reasons,
>> it will not restart.
>>
>> For us, as all data is stored via checkpoint and we are satisfied with
>> at-least-once semantics, it would be useful if the driver were to come back
>> on it's own and pick back up.
>>
>> Firstly, may we add this to JIRA? Secondly, Is there any insight as to
>> what the thought is around allowing that to be configurable in the future?
>> If it's not something that will happen natively, we will end up needing to
>> write something that polls or listens to k8s and has the ability to
>> re-submit any failed jobs.
>>
>> Thanks!
>>
>> --
>>
>> *Lucas Kacher*Senior Engineer
>> -
>> vsco.co 
>> New York, NY
>> 818.512.5239
>>
>


Re: [Kubernetes] structured-streaming driver restarts / roadmap

2018-03-28 Thread Anirudh Ramanathan
We discussed this early on in our fork and I think we should have this in a
JIRA and discuss it further. It's something we want to address in the
future.

One proposed method is using a StatefulSet of size 1 for the driver. This
ensures recovery but at the same time takes away from the completion
semantics of a single pod.


See history in https://github.com/apache-spark-on-k8s/spark/issues/288

On Wed, Mar 28, 2018, 6:56 AM Lucas Kacher  wrote:

> A carry-over from the apache-spark-on-k8s project, it would be useful to
> have a configurable restart policy for submitted jobs with the Kubernetes
> resource manager. See the following issues:
>
> https://github.com/apache-spark-on-k8s/spark/issues/133
> https://github.com/apache-spark-on-k8s/spark/issues/288
> https://github.com/apache-spark-on-k8s/spark/issues/546
>
> Use case: I have a structured streaming job that reads from Kafka,
> aggregates, and writes back out to Kafka deployed via k8s and checkpointing
> to a remote location. If the driver pod dies for a any number of reasons,
> it will not restart.
>
> For us, as all data is stored via checkpoint and we are satisfied with
> at-least-once semantics, it would be useful if the driver were to come back
> on it's own and pick back up.
>
> Firstly, may we add this to JIRA? Secondly, Is there any insight as to
> what the thought is around allowing that to be configurable in the future?
> If it's not something that will happen natively, we will end up needing to
> write something that polls or listens to k8s and has the ability to
> re-submit any failed jobs.
>
> Thanks!
>
> --
>
> *Lucas Kacher*Senior Engineer
> -
> vsco.co 
> New York, NY
> 818.512.5239
>


[Kubernetes] structured-streaming driver restarts / roadmap

2018-03-28 Thread Lucas Kacher
A carry-over from the apache-spark-on-k8s project, it would be useful to
have a configurable restart policy for submitted jobs with the Kubernetes
resource manager. See the following issues:

https://github.com/apache-spark-on-k8s/spark/issues/133
https://github.com/apache-spark-on-k8s/spark/issues/288
https://github.com/apache-spark-on-k8s/spark/issues/546

Use case: I have a structured streaming job that reads from Kafka,
aggregates, and writes back out to Kafka deployed via k8s and checkpointing
to a remote location. If the driver pod dies for a any number of reasons,
it will not restart.

For us, as all data is stored via checkpoint and we are satisfied with
at-least-once semantics, it would be useful if the driver were to come back
on it's own and pick back up.

Firstly, may we add this to JIRA? Secondly, Is there any insight as to what
the thought is around allowing that to be configurable in the future? If
it's not something that will happen natively, we will end up needing to
write something that polls or listens to k8s and has the ability to
re-submit any failed jobs.

Thanks!

-- 

*Lucas Kacher*Senior Engineer
-
vsco.co 
New York, NY
818.512.5239