If your operations are idempotent, you should be able to just run a
totally separate job that looks for failed batches and does a kafkaRDD
to reprocess that batch. C* probably isn't the first choice for what
is essentially a queue, but if the frequency of batches is relatively
low it probably
>
> Personally I think forcing the stream to fail (e.g. check offsets in
> downstream store and throw exception if they aren't as expected) is
> the safest thing to do.
I would think so too, but just for say 2-3 (sometimes just 1) failed
batches in a whole day, I am trying to not kill the whole
Personally I think forcing the stream to fail (e.g. check offsets in
downstream store and throw exception if they aren't as expected) is
the safest thing to do.
If you proceed after a failure, you need a place to reliably record
the batches that failed for later processing.
On Wed, Dec 7, 2016
Hi,
I am trying to solve this problem - in my streaming flow, every day few
jobs fail due to some (say kafka cluster maintenance etc, mostly
unavoidable) reasons for few batches and resumes back to success.
I want to reprocess those failed jobs programmatically (assume I have a way
of getting