I don't have any sample code, but on a high level:

My state would be: (Long, BloomFilter[UUID])
In the update function, my value will be the UUID of the record, since the
word itself is the key.
I'll ask my BloomFilter if I've seen this UUID before. If not increase
count, also add to Filter.

Does that make sense?


On Wed, Jan 25, 2017 at 9:28 AM, shyla deshpande <deshpandesh...@gmail.com>
wrote:

> Hi Burak,
> Thanks for the response. Can you please elaborate on your idea of storing
> the state of the unique ids.
> Do you have any sample code or links I can refer to.
> Thanks
>
> On Wed, Jan 25, 2017 at 9:13 AM, Burak Yavuz <brk...@gmail.com> wrote:
>
>> Off the top of my head... (Each may have it's own issues)
>>
>> If upstream you add a uniqueId to all your records, then you may use a
>> BloomFilter to approximate if you've seen a row before.
>> The problem I can see with that approach is how to repopulate the bloom
>> filter on restarts.
>>
>> If you are certain that you're not going to reprocess some data after a
>> certain time, i.e. there is no way I'm going to get the same data in 2
>> hours, it may only happen in the last 2 hours, then you may also keep the
>> state of uniqueId's as well, and then age them out after a certain time.
>>
>>
>> Best,
>> Burak
>>
>> On Tue, Jan 24, 2017 at 9:53 PM, shyla deshpande <
>> deshpandesh...@gmail.com> wrote:
>>
>>> Please share your thoughts.....
>>>
>>> On Tue, Jan 24, 2017 at 4:01 PM, shyla deshpande <
>>> deshpandesh...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande <
>>>> deshpandesh...@gmail.com> wrote:
>>>>
>>>>> My streaming application stores lot of aggregations using
>>>>> mapWithState.
>>>>>
>>>>> I want to know what are all the possible ways I can make it
>>>>> idempotent.
>>>>>
>>>>> Please share your views.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Mon, Jan 23, 2017 at 5:41 PM, shyla deshpande <
>>>>> deshpandesh...@gmail.com> wrote:
>>>>>
>>>>>> In a Wordcount application which  stores the count of all the words
>>>>>> input so far using mapWithState.  How do I make sure my counts are not
>>>>>> messed up if I happen to read a line more than once?
>>>>>>
>>>>>> Appreciate your response.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to