Hi,

We have a spark job that calls a microservice in the lambda function of the
flatmap transformation  -> passes to this microservice, the inbound element
in the lambda function and returns the transformed value or "None" from the
microservice as an output of this flatMap transform. Of course the lambda
also takes care of exceptions from the microservice etc.. The question is:
there are times when the microservice may be down and there is no point
recording an exception and putting the message in the DLQ for every element
in our streaming pipeline so long as the microservice stays down. Instead
we want to be able to do is retry the microservice call for a given event
for a predefined no. of times and if found to be down then terminate the
spark job so that this current microbatch is terminated and there is no
next microbatch and the rest of the messages continue therefore continue to
be in the source kafka topics unpolled and therefore unprocesseed.  until
the microservice is back up and the spark job is redeployed again. In
regular microservices, we can implement this using the Circuit breaker
pattern. In Spark jobs however this would mean, being able to somehow send
a signal from an executor JVM to the driver JVM to terminate the Spark job.
Is there a way to do that in Spark?

P.S.:
- Having the circuit breaker functionality helps specificize the purpose of
the DLQ to data or schema issues only instead of infra/network related
issues.
- As far as the need for the Spark job to use microservices is concerned,
think of it as a complex logic being maintained in a microservice that does
not warrant duplication.
- checkpointing is being taken care of manually and not using spark's
default checkpointing mechanism.

Regards,
Sheel

Reply via email to