subject:"Re\: Mongo Connector"

Re: Mongo Connector

2019-10-13 Thread JingsongLee

Hi vijay:

I developed an append stream sink for Mongo internally, which writes data in 
batches according to configureable batch size, and also provides asynchronous 
flush. But only insert not update or upsert. It's a good job. It's been working 
very well for a long time. (Throughput depends mainly on MongoServer)

I've investigated Mongo's API, use "UpdateOptions.upsert" to achieve upsert 
functionality, which should be similar to ElasticsearchUpsertTableSink.

About reading, the main problem is how to support distributed multiple 
parallelism reads, need to get the splitKeys of MongoShard to build the 
filtering conditions to split mongo source to multiple parallelism. (I think 
beam MongoDbIO is a very good example)

Best,
Jingsong Lee


--
From:Vijay Srinivasaraghavan 
Send Time:2019年10月13日(星期日) 11:52
To:Dev 
Subject:Mongo Connector

Hello,
Do we know how much of support we have for Mongo? The documentation page is 
pointing to a connector repo that was very old (last updated 5 years ago) and 
looks like that was just a sample code to showcase the integration.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/connectors.html#access-mongodb
I am planning to build a pipeline that involves heavy use of Mongo (reads as 
well as bulk upserts). Trying to understand if anyone has used Mongo in the 
pipeline and would like to share some of their experience?

Are there any known limitations and gotchas?
Appreciate any inputs.
RegardsVijay

Re: Mongo Connector

2019-10-16 Thread Vijay Srinivasaraghavan

Hi Jingsong Lee,
Thanks for the details. Were you able to achieve end-to-end exactly once
support with Mongo?
Also, for doing any intermittent reads from Mongo (Kafka -> process event ->
lookup Mongo -> enhance event -> Sink to Mongo), I am thinking of using Async
IO
(https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html).
In your implementation, did you have a need to use this API?
RegardsVijay

On Sunday, October 13, 2019, 08:11:06 PM PDT, JingsongLee
wrote:

Hi vijay:

I developed an append stream sink for Mongo internally, which writes data in
batches according to configureable batch size, and also provides asynchronous
flush. But only insert not update or upsert. It's a good job. It's been working
very well for a long time. (Throughput depends mainly on MongoServer)

I've investigated Mongo's API, use "UpdateOptions.upsert" to achieve upsert
functionality, which should be similar to ElasticsearchUpsertTableSink.

About reading, the main problem is how to support distributed multiple
parallelism reads, need to get the splitKeys of MongoShard to build the
filtering conditions to split mongo source to multiple parallelism. (I think
beam MongoDbIO is a very good example)

Best,
Jingsong Lee

--
From:Vijay Srinivasaraghavan
Send Time:2019年10月13日(星期日) 11:52
To:Dev
Subject:Mongo Connector

Hello,
Do we know how much of support we have for Mongo? The documentation page is
pointing to a connector repo that was very old (last updated 5 years ago) and
looks like that was just a sample code to showcase the integration.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/connectors.html#access-mongodb
I am planning to build a pipeline that involves heavy use of Mongo (reads as
well as bulk upserts). Trying to understand if anyone has used Mongo in the
pipeline and would like to share some of their experience?

Are there any known limitations and gotchas?
Appreciate any inputs.
RegardsVijay

Re: Mongo Connector

Re: Mongo Connector

2 matches

Site Navigation

Mail list logo

Footer information