Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

Benchao Li Thu, 24 Sep 2020 05:19:30 -0700

Hi Jingsong,

Thanks for preparing this FLIP.
WRT ParallelismProvider, it looks good to me.


Kurt Young <[email protected]> 于2020年9月24日周四 下午4:14写道：

> Yeah, JDBC is definitely a popular use case we should consider.
>
> Best,
> Kurt
>
>
> On Thu, Sep 24, 2020 at 4:11 PM Flavio Pompermaier <[email protected]>
> wrote:
>
> > Hi Kurt, in the past we had a very interesting use case in this regard:
> our
> > customer (oracle) db was quite big and running too many queries in
> parallel
> > was too heavy and it was causing the queries to fail.
> > So we had to limit the source parallelism to 2 threads. After the
> fetching
> > of the data the other operators could use the max parallelism as usual..
> >
> > Best,
> > Flavio
> >
> > On Thu, Sep 24, 2020 at 9:59 AM Kurt Young <[email protected]> wrote:
> >
> > > Thanks Jingsong for driving this, this is indeed a useful feature and
> > lots
> > > of users are asking for it.
> > >
> > > For setting a fixed source parallelism, I'm wondering whether this is
> > > necessary. For kafka,
> > > I can imagine users would expect Flink will use the number of
> partitions
> > as
> > > the parallelism. If it's too
> > > large, one can use the max parallelism to make it smaller.
> > > But for ES, which doesn't have ability to decide a reasonable
> parallelism
> > > on its own, it might make sense
> > > to introduce a user specified parallelism for such table source.
> > >
> > > So I think it would be better to reorganize the document a little bit,
> to
> > > explain the connectors one by one. Briefly
> > > introduce use cases and what kind of options are needed in your
> opinion.
> > >
> > > Regarding the interface `DataStreamScanProvider`, a concrete example
> > would
> > > help the discussion. What kind
> > > of scenarios do you want to support? And what kind of connectors need
> > such
> > > an interface?
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Wed, Sep 23, 2020 at 9:30 PM admin <[email protected]> wrote:
> > >
> > > > +1,it’s a good news
> > > >
> > > > > 2020年9月23日 下午6:22，Jingsong Li <[email protected]> 写道：
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a discussion about improving the new TableSource
> > and
> > > > > TableSink interfaces.
> > > > >
> > > > > Most connectors have been migrated to FLIP-95, but there are still
> > the
> > > > > Filesystem and Hive that have not been migrated. They have some
> > > > > requirements on table connector API. And users also have some
> > > additional
> > > > > requirements：
> > > > > - Some connectors have the ability to infer parallelism, the
> > > parallelism
> > > > is
> > > > > good for most cases.
> > > > > - Users have customized parallelism configuration requirements for
> > > source
> > > > > and sink.
> > > > > - The connectors need to use topology to build their source/sink
> > > instead
> > > > of
> > > > > a single function. Like JIRA[1], Partition Commit feature and File
> > > > > Compaction feature.
> > > > >
> > > > > Details are in [2].
> > > > >
> > > > > [1]https://issues.apache.org/jira/browse/FLINK-18674
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > >
> > > >
> >
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

Reply via email to