Hi,

I have two streaming RDD1 and RDD2 and want to cogroup them.
Data don't come in the same time and sometimes they could come with some
delay.
When I get all data I want to insert in MongoDB.

For example, imagine that I get:
RDD1 --> T 0
RDD2 -->T 0.5
I do cogroup between them but I couldn't store in Mongo yet because it
could come more data in the next windows/slide.
RDD2' -->T 1.5
Another RDD2' comes, I only want to save in Mongo once. So, I should only
save it when I get all data. What I know it's how long I should wait as
much.

Ideally, I would like to save in MongoDB in the last slide for each RDD
when I know that there is not possible to get more RDD2 to join with RDD1.
Is it possible? how?

Maybe there is other way to resolve this problem, any idea?

Reply via email to