Re: [Structured Streaming] Application Updates in Production

2018-03-22 Thread Tathagata Das
ank Shrivastava <priy...@asperasoft.com> > Cc: user <user@spark.apache.org> > Subject: Re: [Structured Streaming] Application Updates in Production > Date: Wed, Mar 21, 2018 5:28 PM > > Why do you want to start the new code in parallel to the old one? Why not > stop the

Re: [Structured Streaming] Application Updates in Production

2018-03-21 Thread Tathagata Das
Why do you want to start the new code in parallel to the old one? Why not stop the old one, and then start the new one? Structured Streaming ensures that all checkpoint information (offsets and state) are future-compatible (as long as state schema is unchanged), hence new code should be able to

[Structured Streaming] Application Updates in Production

2018-03-21 Thread Priyank Shrivastava
I am using Structured Streaming with Spark 2.2. We are using Kafka as our source and are using checkpoints for failure recovery and e2e exactly once guarantees. I would like to get some more information on how to handle updates to the application when there is a change in stateful operations