Re: Reprocessing failed jobs in Streaming job

2016-12-07 Thread Cody Koeninger
If your operations are idempotent, you should be able to just run a totally separate job that looks for failed batches and does a kafkaRDD to reprocess that batch. C* probably isn't the first choice for what is essentially a queue, but if the frequency of batches is relatively low it probably

Re: Reprocessing failed jobs in Streaming job

2016-12-07 Thread map reduced
> > Personally I think forcing the stream to fail (e.g. check offsets in > downstream store and throw exception if they aren't as expected) is > the safest thing to do. I would think so too, but just for say 2-3 (sometimes just 1) failed batches in a whole day, I am trying to not kill the whole

Re: Reprocessing failed jobs in Streaming job

2016-12-07 Thread Cody Koeninger
Personally I think forcing the stream to fail (e.g. check offsets in downstream store and throw exception if they aren't as expected) is the safest thing to do. If you proceed after a failure, you need a place to reliably record the batches that failed for later processing. On Wed, Dec 7, 2016

Reprocessing failed jobs in Streaming job

2016-12-07 Thread map reduced
Hi, I am trying to solve this problem - in my streaming flow, every day few jobs fail due to some (say kafka cluster maintenance etc, mostly unavoidable) reasons for few batches and resumes back to success. I want to reprocess those failed jobs programmatically (assume I have a way of getting