Hi Jingsong Lee,
Thanks for the details. Were you able to achieve end-to-end exactly once
support with Mongo?
Also, for doing any intermittent reads from Mongo (Kafka -> process event ->
lookup Mongo -> enhance event -> Sink to Mongo), I am thinking of using Async
IO
(https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html).
In your implementation, did you have a need to use this API?
RegardsVijay
On Sunday, October 13, 2019, 08:11:06 PM PDT, JingsongLee
wrote:
Hi vijay:
I developed an append stream sink for Mongo internally, which writes data in
batches according to configureable batch size, and also provides asynchronous
flush. But only insert not update or upsert. It's a good job. It's been working
very well for a long time. (Throughput depends mainly on MongoServer)
I've investigated Mongo's API, use "UpdateOptions.upsert" to achieve upsert
functionality, which should be similar to ElasticsearchUpsertTableSink.
About reading, the main problem is how to support distributed multiple
parallelism reads, need to get the splitKeys of MongoShard to build the
filtering conditions to split mongo source to multiple parallelism. (I think
beam MongoDbIO is a very good example)
Best,
Jingsong Lee
--
From:Vijay Srinivasaraghavan
Send Time:2019年10月13日(星期日) 11:52
To:Dev
Subject:Mongo Connector
Hello,
Do we know how much of support we have for Mongo? The documentation page is
pointing to a connector repo that was very old (last updated 5 years ago) and
looks like that was just a sample code to showcase the integration.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/connectors.html#access-mongodb
I am planning to build a pipeline that involves heavy use of Mongo (reads as
well as bulk upserts). Trying to understand if anyone has used Mongo in the
pipeline and would like to share some of their experience?
Are there any known limitations and gotchas?
Appreciate any inputs.
RegardsVijay