Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread shyla deshpande
Processing of the same data more than once can happen only when the app recovers after failure or during upgrade. So how do I apply your 2nd solution only for 1-2 hrs after restart. On Wed, Jan 25, 2017 at 12:51 PM, shyla deshpande wrote: > Thanks Burak. I do want

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread shyla deshpande
Thanks Burak. I do want accuracy, that is why I want to make it idempotent. I will try out your 2nd solution. On Wed, Jan 25, 2017 at 12:27 PM, Burak Yavuz wrote: > Yes you may. Depends on if you want exact values or if you're okay with > approximations. With Big Data,

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread Burak Yavuz
Yes you may. Depends on if you want exact values or if you're okay with approximations. With Big Data, generally you would be okay with approximations. Try both out, see what scales/works with your dataset. Maybe you may handle the second implementation. On Wed, Jan 25, 2017 at 12:23 PM, shyla

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread shyla deshpande
Thanks Burak. But with BloomFilter, won't I be getting a false poisitve? On Wed, Jan 25, 2017 at 11:28 AM, Burak Yavuz wrote: > I noticed that 1 wouldn't be a problem, because you'll save the > BloomFilter in the state. > > For 2, you would keep a Map of UUID's to the

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread Burak Yavuz
I noticed that 1 wouldn't be a problem, because you'll save the BloomFilter in the state. For 2, you would keep a Map of UUID's to the timestamp of when you saw them. If the UUID exists in the map, then you wouldn't increase the count. If the timestamp of a UUID expires, you would remove it from

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread shyla deshpande
In the previous email you gave me 2 solutions 1. Bloom filter --> problem in repopulating the bloom filter on restarts 2. keeping the state of the unique ids Please elaborate on 2. On Wed, Jan 25, 2017 at 10:53 AM, Burak Yavuz wrote: > I don't have any sample code, but on a

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread Burak Yavuz
I don't have any sample code, but on a high level: My state would be: (Long, BloomFilter[UUID]) In the update function, my value will be the UUID of the record, since the word itself is the key. I'll ask my BloomFilter if I've seen this UUID before. If not increase count, also add to Filter.

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread shyla deshpande
Hi Burak, Thanks for the response. Can you please elaborate on your idea of storing the state of the unique ids. Do you have any sample code or links I can refer to. Thanks On Wed, Jan 25, 2017 at 9:13 AM, Burak Yavuz wrote: > Off the top of my head... (Each may have it's own

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread Burak Yavuz
Off the top of my head... (Each may have it's own issues) If upstream you add a uniqueId to all your records, then you may use a BloomFilter to approximate if you've seen a row before. The problem I can see with that approach is how to repopulate the bloom filter on restarts. If you are certain

Re: How to make the state in a streaming application idempotent?

2017-01-24 Thread shyla deshpande
Please share your thoughts. On Tue, Jan 24, 2017 at 4:01 PM, shyla deshpande wrote: > > > On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande > wrote: > >> My streaming application stores lot of aggregations using mapWithState. >> >> I want

Re: How to make the state in a streaming application idempotent?

2017-01-24 Thread shyla deshpande
On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande wrote: > My streaming application stores lot of aggregations using mapWithState. > > I want to know what are all the possible ways I can make it idempotent. > > Please share your views. > > Thanks > > On Mon, Jan 23, 2017

Re: How to make the state in a streaming application idempotent?

2017-01-24 Thread shyla deshpande
My streaming application stores lot of aggregations using mapWithState. I want to know what are all the possible ways I can make it idempotent. Please share your views. Thanks On Mon, Jan 23, 2017 at 5:41 PM, shyla deshpande wrote: > In a Wordcount application which

How to make the state in a streaming application idempotent?

2017-01-23 Thread shyla deshpande
In a Wordcount application which stores the count of all the words input so far using mapWithState. How do I make sure my counts are not messed up if I happen to read a line more than once? Appreciate your response. Thanks