Hi Cody,
That is clear. Thanks!
Bill
On Tue, May 19, 2015 at 1:27 PM, Cody Koeninger c...@koeninger.org wrote:
If you checkpoint, the job will start from the successfully consumed
offsets. If you don't checkpoint, by default it will start from the
highest available offset, and you will
If you checkpoint, the job will start from the successfully consumed
offsets. If you don't checkpoint, by default it will start from the
highest available offset, and you will potentially lose data.
Is the link I posted, or for that matter the scaladoc, really not clear on
that point?
The
Hi all,
I am currently using Spark streaming to consume and save logs every hour in
our production pipeline. The current setting is to run a crontab job to
check every minute whether the job is still there and if not resubmit a
Spark streaming job. I am currently using the direct approach for
Have you read
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md ?
1. There's nothing preventing that.
2. Checkpointing will give you at-least-once semantics, provided you have
sufficient kafka retention. Be aware that checkpoints aren't recoverable
if you upgrade code.
If a Spark streaming job stops at 12:01 and I resume the job at 12:02. Will
it still start to consume the data that were produced to Kafka at 12:01? Or
it will just start consuming from the current time?
On Tue, May 19, 2015 at 10:58 AM, Cody Koeninger c...@koeninger.org wrote:
Have you read