Hi all,

I was hoping for some advice on creating a MicroBatchStream for MongoDB.

MongoDB has a tailable cursor that listens to changes in a collection
(known as a change stream). As a user watches a collection via the change
stream cursor, the cursor reports a resume token that determines the last
seen operation. I would like to use this resume token as my offset.

What I'm struggling with is how to signal this resume token offset from the
PartitionReader back to MicroBatchStream.  Even if there is no result back
from the cursor a new last seen resume token is available.

When planning partitions I only know the start offset, the resume token
represents the last offset. Because I'm using a single change stream
cursor, I am only producing a single partition each time. However, I would
like to be able to continue the next partition from the last seen resume
token as seen by the PartitionReader.

Is this approach possible? If so how would I pass that information back to
the Spark driver / MicroBatchStream?

Ross

Reply via email to