RE: Re: Re: Re: Re: [DISCUSS] FLIP-239: Port JDBC Connector Source to FLIP-27

Roc Marshal Tue, 09 Aug 2022 19:41:28 -0700

Hi, 
Martijn,  Boto.
I just complete the design of the source and the skeleton design of the sink at 
present.   I think the current Flip is missing part of the sink design.
@Boto Would you like to complete the sink part directly on the FLIP page?
Looking forward to your reply.


On 2022/08/01 13:42:48 Martijn Visser wrote:
> Hi,
> 
> There is currently already a PR submitted to port the JDBC interface to the
> new interfaces. Can we make sure that this FLIP is being finalized, so that
> you and other maintainers can work on getting the PRs correct and
> eventually merged in?
> 
> Best regards,
> 
> Martijn
> 
> Op ma 4 jul. 2022 om 16:38 schreef Martijn Visser <[email protected]
> >:
> 
> > Hi Roc,
> >
> > Thanks for the FLIP and opening the discussion. I have a couple of initial
> > questions/remarks:
> >
> > * The FLIP contains information for both Source and Sink, but nothing
> > explicitly on the Lookup functionality. I'm assuming we also want to have
> > that implementation covered while porting this to the new interfaces.
> > * The FLIP mentions porting to both the new Source and the new Sink API,
> > but the FLIP only contains detailed information on the Source. Are you
> > planning to add that to the FLIP before casting a vote? Because the
> > discussion should definitely be resolved for both the Source and the Sink.
> >
> > Best regards,
> >
> > Martijn
> >
> > Op za 2 jul. 2022 om 06:35 schreef Roc Marshal <[email protected]>:
> >
> >> Hi, Weike.
> >>
> >> Thank you for your reply
> >> As you said, too many splits stored in SourceEnumerator will increase the
> >> load of JM.
> >> What do you think if we introduce a capacity of splits in
> >> SourceEnumerator to limit the total number, and introduce a reject or
> >> callback mechanism with too many splits in the timely generation strategy
> >> to solve this problem?
> >> Looking forward to a better solution .
> >>
> >> Best regards,
> >> Roc Marshal
> >>
> >> On 2022/07/01 07:58:22 Dong Weike wrote:
> >> > Hi,
> >> >
> >> > Thank you for bringing this up, and I am +1 for this feature.
> >> >
> >> > IMO, one important thing that I would like to mention is that an
> >> improperly-designed FLIP-27 connector could impose very severe memory
> >> pressure on the JobManager, especially when there are enormous number of
> >> splits for the source tables, e.g. there are billions of records to read.
> >> Frankly speaking, we have been haunted by this problem for a long time when
> >> using the Flink CDC Connectors to read large tables.
> >> >
> >> > Therefore, in order to prevent JobManager from experiencing frequent
> >> OOM faults, JdbcSourceEnumerator should avoid saving too many
> >> JdbcSourceSplits in the unassigned list. And it would be better if all the
> >> splits would be computed on the fly.
> >> >
> >> > Best,
> >> > Weike
> >> >
> >> > -----邮件原件-----
> >> > 发件人: Lijie Wang <[email protected]>
> >> > 发送时间: 2022年7月1日 上午 10:25
> >> > 收件人: [email protected]
> >> > 主题: Re: Re: [DISCUSS] FLIP-239: Port JDBC Connector Source to FLIP-27
> >> >
> >> > Hi Roc,
> >> >
> >> > Thanks for driving the discussion.
> >> >
> >> > Could you describe in detail what the JdbcSourceSplit represents? It
> >> looks like something wrong with the comments of JdbcSourceSplit in FLIP(it
> >> describe as "A {@link SourceSplit} that represents a file, or a region of a
> >> file....").
> >> >
> >> > Best,
> >> > Lijie
> >> >
> >> >
> >> > Roc Marshal <[email protected]> 于2022年6月30日周四 21:41写道：
> >> >
> >> > > Hi, Boto.
> >> > >     Thanks for your reply.
> >> > >
> >> > >    +1 to me on watermark strategy definition in ‘streaming’ & table
> >> > > source. I'm not sure if FLIP-202[1]  is suitable for a separate
> >> > > discussion, but I think your proposal is very helpful to the new
> >> > > source. It would be great if the new source could be compatible with
> >> this abstraction.
> >> > >
> >> > >    In addition, whether we need to support such a special bounded
> >> > > scenario abstraction?
> >> > >    The number of JdbcSourceSplit is certain, but the time to generate
> >> > > all JdbcSourceSplit completely is not certain in the user defined
> >> > > implementation. When the condition that the JdbcSourceSplit
> >> > > generate-process end is met, the JdbcSourceSplit will not be
> >> generated.
> >> > > After all JdbcSourceSplit processing is completed, the reader will be
> >> > > notified that there are no more JdbcSourceSplit from
> >> > > JdbcSourceSplitEnumerator.
> >> > >
> >> > > - [1]
> >> > >
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-202%3A+Introduc
> >> > > e+ClickHouse+Connector
> >> > >
> >> > > Best regards,
> >> > > Roc Marshal
> >> > >
> >> > > On 2022/06/30 09:02:23 João Boto wrote:
> >> > > > Hi,
> >> > > >
> >> > > > On source we could improve the JdbcParameterValuesProvider.. to be
> >> > > defined as a query(s) or something more dynamic.
> >> > > > The most time if your job is dynamic or have some condition to be
> >> > > > met
> >> > > (based on data on table) you have to create a connection an get that
> >> > > info from database
> >> > > >
> >> > > > If we are going to create/allow a "streaming" jdbc source, we
> >> should
> >> > > > be
> >> > > able to define watermark and get new data from table using that
> >> watermark..
> >> > > >
> >> > > >
> >> > > > For the sink (but it could apply on source) will be great to be
> >> able
> >> > > > to
> >> > > set your implementation of the connection type.. For example if you
> >> > > are connecting to clickhouse, be able to set a implementation based
> >> on
> >> > > "BalancedClickhouseDataSource" for example (in this[1] implementation
> >> > > we have a example) or set a extension version of a implementation for
> >> > > debug purpose
> >> > > >
> >> > > > Regards
> >> > > >
> >> > > >
> >> > > > [1]
> >> > >
> >> https://github.com/apache/flink/pull/20097/files#diff-8b36e3403381dc14
> >> > > c748aeb5de0b4ceb7d7daec39594b1eacff1694b5266419d
> >> > > >
> >> > > > On 2022/06/27 13:09:51 Roc Marshal wrote:
> >> > > > > Hi, all,
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > I would like to open a discussion on porting JDBC Source to new
> >> > > > > Source
> >> > > API (FLIP-27[1]).
> >> > > > >
> >> > > > > Martijn Visser, Jing Ge and I had a preliminary discussion on the
> >> > > > > JIRA
> >> > > FLINK-25420[2] and planed to start the discussion about the source
> >> > > part first.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > Please let me know:
> >> > > > >
> >> > > > > - The issues about old Jdbc source you encountered;
> >> > > > > - The new feature or design you want;
> >> > > > > - More suggestions from other dimensions...
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > You could find more details in FLIP-239[3].
> >> > > > >
> >> > > > > Looking forward to your feedback.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > [1]
> >> > >
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+
> >> > > Source+Interface
> >> > > > >
> >> > > > > [2] https://issues.apache.org/jira/browse/FLINK-25420
> >> > > > >
> >> > > > > [3]
> >> > >
> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=21738
> >> > > 6271
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > Best regards,
> >> > > > >
> >> > > > > Roc Marshal
> >> > > >
> >> >
> >>
> >
>

RE: Re: Re: Re: Re: [DISCUSS] FLIP-239: Port JDBC Connector Source to FLIP-27

Reply via email to