Processing of the same data more than once can happen only when the app
recovers after failure or during upgrade. So how do I apply your 2nd
solution only for 1-2 hrs after restart.
On Wed, Jan 25, 2017 at 12:51 PM, shyla deshpande
wrote:
> Thanks Burak. I do want
Thanks Burak. I do want accuracy, that is why I want to make it idempotent.
I will try out your 2nd solution.
On Wed, Jan 25, 2017 at 12:27 PM, Burak Yavuz wrote:
> Yes you may. Depends on if you want exact values or if you're okay with
> approximations. With Big Data,
Yes you may. Depends on if you want exact values or if you're okay with
approximations. With Big Data, generally you would be okay with
approximations. Try both out, see what scales/works with your dataset.
Maybe you may handle the second implementation.
On Wed, Jan 25, 2017 at 12:23 PM, shyla
Thanks Burak. But with BloomFilter, won't I be getting a false poisitve?
On Wed, Jan 25, 2017 at 11:28 AM, Burak Yavuz wrote:
> I noticed that 1 wouldn't be a problem, because you'll save the
> BloomFilter in the state.
>
> For 2, you would keep a Map of UUID's to the
I noticed that 1 wouldn't be a problem, because you'll save the BloomFilter
in the state.
For 2, you would keep a Map of UUID's to the timestamp of when you saw
them. If the UUID exists in the map, then you wouldn't increase the count.
If the timestamp of a UUID expires, you would remove it from
In the previous email you gave me 2 solutions
1. Bloom filter --> problem in repopulating the bloom filter on restarts
2. keeping the state of the unique ids
Please elaborate on 2.
On Wed, Jan 25, 2017 at 10:53 AM, Burak Yavuz wrote:
> I don't have any sample code, but on a
I don't have any sample code, but on a high level:
My state would be: (Long, BloomFilter[UUID])
In the update function, my value will be the UUID of the record, since the
word itself is the key.
I'll ask my BloomFilter if I've seen this UUID before. If not increase
count, also add to Filter.
Hi Burak,
Thanks for the response. Can you please elaborate on your idea of storing
the state of the unique ids.
Do you have any sample code or links I can refer to.
Thanks
On Wed, Jan 25, 2017 at 9:13 AM, Burak Yavuz wrote:
> Off the top of my head... (Each may have it's own
Off the top of my head... (Each may have it's own issues)
If upstream you add a uniqueId to all your records, then you may use a
BloomFilter to approximate if you've seen a row before.
The problem I can see with that approach is how to repopulate the bloom
filter on restarts.
If you are certain
Please share your thoughts.
On Tue, Jan 24, 2017 at 4:01 PM, shyla deshpande
wrote:
>
>
> On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande > wrote:
>
>> My streaming application stores lot of aggregations using mapWithState.
>>
>> I want
On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande
wrote:
> My streaming application stores lot of aggregations using mapWithState.
>
> I want to know what are all the possible ways I can make it idempotent.
>
> Please share your views.
>
> Thanks
>
> On Mon, Jan 23, 2017
My streaming application stores lot of aggregations using mapWithState.
I want to know what are all the possible ways I can make it idempotent.
Please share your views.
Thanks
On Mon, Jan 23, 2017 at 5:41 PM, shyla deshpande
wrote:
> In a Wordcount application which
In a Wordcount application which stores the count of all the words input
so far using mapWithState. How do I make sure my counts are not messed up
if I happen to read a line more than once?
Appreciate your response.
Thanks
13 matches
Mail list logo