Hi, Martijn, Boto. I just complete the design of the source and the skeleton design of the sink at present. I think the current Flip is missing part of the sink design. @Boto Would you like to complete the sink part directly on the FLIP page? Looking forward to your reply.
On 2022/08/01 13:42:48 Martijn Visser wrote: > Hi, > > There is currently already a PR submitted to port the JDBC interface to the > new interfaces. Can we make sure that this FLIP is being finalized, so that > you and other maintainers can work on getting the PRs correct and > eventually merged in? > > Best regards, > > Martijn > > Op ma 4 jul. 2022 om 16:38 schreef Martijn Visser <martijnvis...@apache.org > >: > > > Hi Roc, > > > > Thanks for the FLIP and opening the discussion. I have a couple of initial > > questions/remarks: > > > > * The FLIP contains information for both Source and Sink, but nothing > > explicitly on the Lookup functionality. I'm assuming we also want to have > > that implementation covered while porting this to the new interfaces. > > * The FLIP mentions porting to both the new Source and the new Sink API, > > but the FLIP only contains detailed information on the Source. Are you > > planning to add that to the FLIP before casting a vote? Because the > > discussion should definitely be resolved for both the Source and the Sink. > > > > Best regards, > > > > Martijn > > > > Op za 2 jul. 2022 om 06:35 schreef Roc Marshal <fl...@126.com>: > > > >> Hi, Weike. > >> > >> Thank you for your reply > >> As you said, too many splits stored in SourceEnumerator will increase the > >> load of JM. > >> What do you think if we introduce a capacity of splits in > >> SourceEnumerator to limit the total number, and introduce a reject or > >> callback mechanism with too many splits in the timely generation strategy > >> to solve this problem? > >> Looking forward to a better solution . > >> > >> Best regards, > >> Roc Marshal > >> > >> On 2022/07/01 07:58:22 Dong Weike wrote: > >> > Hi, > >> > > >> > Thank you for bringing this up, and I am +1 for this feature. > >> > > >> > IMO, one important thing that I would like to mention is that an > >> improperly-designed FLIP-27 connector could impose very severe memory > >> pressure on the JobManager, especially when there are enormous number of > >> splits for the source tables, e.g. there are billions of records to read. > >> Frankly speaking, we have been haunted by this problem for a long time when > >> using the Flink CDC Connectors to read large tables. > >> > > >> > Therefore, in order to prevent JobManager from experiencing frequent > >> OOM faults, JdbcSourceEnumerator should avoid saving too many > >> JdbcSourceSplits in the unassigned list. And it would be better if all the > >> splits would be computed on the fly. > >> > > >> > Best, > >> > Weike > >> > > >> > -----邮件原件----- > >> > 发件人: Lijie Wang <wa...@gmail.com> > >> > 发送时间: 2022年7月1日 上午 10:25 > >> > 收件人: dev@flink.apache.org > >> > 主题: Re: Re: [DISCUSS] FLIP-239: Port JDBC Connector Source to FLIP-27 > >> > > >> > Hi Roc, > >> > > >> > Thanks for driving the discussion. > >> > > >> > Could you describe in detail what the JdbcSourceSplit represents? It > >> looks like something wrong with the comments of JdbcSourceSplit in FLIP(it > >> describe as "A {@link SourceSplit} that represents a file, or a region of a > >> file...."). > >> > > >> > Best, > >> > Lijie > >> > > >> > > >> > Roc Marshal <fl...@126.com> 于2022年6月30日周四 21:41写道: > >> > > >> > > Hi, Boto. > >> > > Thanks for your reply. > >> > > > >> > > +1 to me on watermark strategy definition in ‘streaming’ & table > >> > > source. I'm not sure if FLIP-202[1] is suitable for a separate > >> > > discussion, but I think your proposal is very helpful to the new > >> > > source. It would be great if the new source could be compatible with > >> this abstraction. > >> > > > >> > > In addition, whether we need to support such a special bounded > >> > > scenario abstraction? > >> > > The number of JdbcSourceSplit is certain, but the time to generate > >> > > all JdbcSourceSplit completely is not certain in the user defined > >> > > implementation. When the condition that the JdbcSourceSplit > >> > > generate-process end is met, the JdbcSourceSplit will not be > >> generated. > >> > > After all JdbcSourceSplit processing is completed, the reader will be > >> > > notified that there are no more JdbcSourceSplit from > >> > > JdbcSourceSplitEnumerator. > >> > > > >> > > - [1] > >> > > > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-202%3A+Introduc > >> > > e+ClickHouse+Connector > >> > > > >> > > Best regards, > >> > > Roc Marshal > >> > > > >> > > On 2022/06/30 09:02:23 João Boto wrote: > >> > > > Hi, > >> > > > > >> > > > On source we could improve the JdbcParameterValuesProvider.. to be > >> > > defined as a query(s) or something more dynamic. > >> > > > The most time if your job is dynamic or have some condition to be > >> > > > met > >> > > (based on data on table) you have to create a connection an get that > >> > > info from database > >> > > > > >> > > > If we are going to create/allow a "streaming" jdbc source, we > >> should > >> > > > be > >> > > able to define watermark and get new data from table using that > >> watermark.. > >> > > > > >> > > > > >> > > > For the sink (but it could apply on source) will be great to be > >> able > >> > > > to > >> > > set your implementation of the connection type.. For example if you > >> > > are connecting to clickhouse, be able to set a implementation based > >> on > >> > > "BalancedClickhouseDataSource" for example (in this[1] implementation > >> > > we have a example) or set a extension version of a implementation for > >> > > debug purpose > >> > > > > >> > > > Regards > >> > > > > >> > > > > >> > > > [1] > >> > > > >> https://github.com/apache/flink/pull/20097/files#diff-8b36e3403381dc14 > >> > > c748aeb5de0b4ceb7d7daec39594b1eacff1694b5266419d > >> > > > > >> > > > On 2022/06/27 13:09:51 Roc Marshal wrote: > >> > > > > Hi, all, > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > I would like to open a discussion on porting JDBC Source to new > >> > > > > Source > >> > > API (FLIP-27[1]). > >> > > > > > >> > > > > Martijn Visser, Jing Ge and I had a preliminary discussion on the > >> > > > > JIRA > >> > > FLINK-25420[2] and planed to start the discussion about the source > >> > > part first. > >> > > > > > >> > > > > > >> > > > > > >> > > > > Please let me know: > >> > > > > > >> > > > > - The issues about old Jdbc source you encountered; > >> > > > > - The new feature or design you want; > >> > > > > - More suggestions from other dimensions... > >> > > > > > >> > > > > > >> > > > > > >> > > > > You could find more details in FLIP-239[3]. > >> > > > > > >> > > > > Looking forward to your feedback. > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > [1] > >> > > > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+ > >> > > Source+Interface > >> > > > > > >> > > > > [2] https://issues.apache.org/jira/browse/FLINK-25420 > >> > > > > > >> > > > > [3] > >> > > > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=21738 > >> > > 6271 > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > Best regards, > >> > > > > > >> > > > > Roc Marshal > >> > > > > >> > > >> > > >