Re: [DISCUSS] Moving connectors from Flink to external connector repositories

2022-01-13 Thread Martijn Visser
Hi everyone,

If you have any more comments or questions, please let me know. Else I
would open up a vote on this thread in the next couple of days.

Best regards,

Martijn

On Thu, 6 Jan 2022 at 09:45, Qingsheng Ren  wrote:

> Thanks Martijn for driving this!
>
> I’m +1 for Martijn’s proposal. It’s important to avoid making some
> connectors above others, and all connectors should share the same quality
> standard. Keeping some basic connectors like FileSystem is reasonable since
> it’s crucial for new users to try and explore Flink quickly.
>
> Another point I’d like to mention is that we need to add more E2E cases
> using basic connectors in Flink main repo after we moving connectors out.
> Currently E2E tests are heavily dependent on connectors. It’s essential to
> keep the coverage and quality of Flink main repo even without these
> connector’s E2E cases.
>
> Best regards,
>
> Qingsheng Ren
>
>
> > On Jan 5, 2022, at 9:59 PM, Martijn Visser 
> wrote:
> >
> > Hi everyone,
> >
> > As already mentioned in the previous discussion thread [1] I'm opening
> up a
> > parallel discussion thread on moving connectors from Flink to external
> > connector repositories. If you haven't read up on this discussion
> before, I
> > recommend reading that one first.
> >
> > The goal with the external connector repositories is to make it easier to
> > develop and release connectors by not being bound to the release cycle of
> > Flink itself. It should result in faster connector releases, a more
> active
> > connector community and a reduced build time for Flink.
> >
> > We currently have the following connectors available in Flink itself:
> >
> > * Kafka -> For DataStream & Table/SQL users
> > * Upsert-Kafka -> For Table/SQL users
> > * Cassandra -> For DataStream users
> > * Elasticsearch -> For DataStream & Table/SQL users
> > * Kinesis -> For DataStream users & Table/SQL users
> > * RabbitMQ -> For DataStream users
> > * Google Cloud PubSub -> For DataStream users
> > * Hybrid Source -> For DataStream users
> > * NiFi -> For DataStream users
> > * Pulsar -> For DataStream users
> > * Twitter -> For DataStream users
> > * JDBC -> For DataStream & Table/SQL users
> > * FileSystem -> For DataStream & Table/SQL users
> > * HBase -> For DataStream & Table/SQL users
> > * DataGen -> For Table/SQL users
> > * Print -> For Table/SQL users
> > * BlackHole -> For Table/SQL users
> > * Hive -> For Table/SQL users
> >
> > I would propose to move out all connectors except Hybrid Source,
> > FileSystem, DataGen, Print and BlackHole because:
> >
> > * We should avoid at all costs that certain connectors are considered as
> > 'Core' connectors. If that happens, it creates a perception that there
> are
> > first-grade/high-quality connectors because they are in 'Core' Flink and
> > second-grade/lesser-quality connectors because they are outside of the
> > Flink codebase. It directly hurts the goal, because these connectors are
> > still bound to the release cycle of Flink. Last but not least, it risks
> any
> > success of external connector repositories since every connector
> > contributor would still want to be in 'Core' Flink.
> > * To continue on the quality of connectors, we should aim that all
> > connectors are of high quality. That means that we shouldn't have a
> > connector that's only available for either DataStream or Table/SQL users,
> > but for both. It also means that (if applicable) the connector should
> > support all options, like bounded and unbounded scan, lookup, batch and
> > streaming sink capabilities. In the end the quality should depend on the
> > maintainers of the connector, not on where the code is maintained.
> > * The Hybrid Source connector is a special connector because of its
> > purpose.
> > * The FileSystem, DataGen, Print and BlackHole connectors are important
> for
> > first time Flink users/testers. If you want to experiment with Flink, you
> > will most likely start with a local file before moving to one of the
> other
> > sources or sinks. These 4 connectors can help with either reading/writing
> > local files or generating/displaying/ignoring data.
> > * Some of the connectors haven't been maintained in a long time (for
> > example, NiFi and Google Cloud PubSub). An argument could be made that we
> > check if we actually want to move such a connector or make the decision
> to
> > drop the connector entirely.
> >
> > I'm looking forward to your thoughts!
> >
> > Best regards,
> >
> > Martijn Visser | Product Manager
> >
> > mart...@ververica.com
> >
> > [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
> >
> > 
> >
> >
> > Follow us @VervericaData
> >
> > --
> >
> > Join Flink Forward  - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
>
>


Re: [DISCUSS] Moving connectors from Flink to external connector repositories

2022-01-06 Thread Qingsheng Ren
Thanks Martijn for driving this! 

I’m +1 for Martijn’s proposal. It’s important to avoid making some connectors 
above others, and all connectors should share the same quality standard. 
Keeping some basic connectors like FileSystem is reasonable since it’s crucial 
for new users to try and explore Flink quickly. 

Another point I’d like to mention is that we need to add more E2E cases using 
basic connectors in Flink main repo after we moving connectors out. Currently 
E2E tests are heavily dependent on connectors. It’s essential to keep the 
coverage and quality of Flink main repo even without these connector’s E2E 
cases.

Best regards, 

Qingsheng Ren


> On Jan 5, 2022, at 9:59 PM, Martijn Visser  wrote:
> 
> Hi everyone,
> 
> As already mentioned in the previous discussion thread [1] I'm opening up a
> parallel discussion thread on moving connectors from Flink to external
> connector repositories. If you haven't read up on this discussion before, I
> recommend reading that one first.
> 
> The goal with the external connector repositories is to make it easier to
> develop and release connectors by not being bound to the release cycle of
> Flink itself. It should result in faster connector releases, a more active
> connector community and a reduced build time for Flink.
> 
> We currently have the following connectors available in Flink itself:
> 
> * Kafka -> For DataStream & Table/SQL users
> * Upsert-Kafka -> For Table/SQL users
> * Cassandra -> For DataStream users
> * Elasticsearch -> For DataStream & Table/SQL users
> * Kinesis -> For DataStream users & Table/SQL users
> * RabbitMQ -> For DataStream users
> * Google Cloud PubSub -> For DataStream users
> * Hybrid Source -> For DataStream users
> * NiFi -> For DataStream users
> * Pulsar -> For DataStream users
> * Twitter -> For DataStream users
> * JDBC -> For DataStream & Table/SQL users
> * FileSystem -> For DataStream & Table/SQL users
> * HBase -> For DataStream & Table/SQL users
> * DataGen -> For Table/SQL users
> * Print -> For Table/SQL users
> * BlackHole -> For Table/SQL users
> * Hive -> For Table/SQL users
> 
> I would propose to move out all connectors except Hybrid Source,
> FileSystem, DataGen, Print and BlackHole because:
> 
> * We should avoid at all costs that certain connectors are considered as
> 'Core' connectors. If that happens, it creates a perception that there are
> first-grade/high-quality connectors because they are in 'Core' Flink and
> second-grade/lesser-quality connectors because they are outside of the
> Flink codebase. It directly hurts the goal, because these connectors are
> still bound to the release cycle of Flink. Last but not least, it risks any
> success of external connector repositories since every connector
> contributor would still want to be in 'Core' Flink.
> * To continue on the quality of connectors, we should aim that all
> connectors are of high quality. That means that we shouldn't have a
> connector that's only available for either DataStream or Table/SQL users,
> but for both. It also means that (if applicable) the connector should
> support all options, like bounded and unbounded scan, lookup, batch and
> streaming sink capabilities. In the end the quality should depend on the
> maintainers of the connector, not on where the code is maintained.
> * The Hybrid Source connector is a special connector because of its
> purpose.
> * The FileSystem, DataGen, Print and BlackHole connectors are important for
> first time Flink users/testers. If you want to experiment with Flink, you
> will most likely start with a local file before moving to one of the other
> sources or sinks. These 4 connectors can help with either reading/writing
> local files or generating/displaying/ignoring data.
> * Some of the connectors haven't been maintained in a long time (for
> example, NiFi and Google Cloud PubSub). An argument could be made that we
> check if we actually want to move such a connector or make the decision to
> drop the connector entirely.
> 
> I'm looking forward to your thoughts!
> 
> Best regards,
> 
> Martijn Visser | Product Manager
> 
> mart...@ververica.com
> 
> [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
> 
> 
> 
> 
> Follow us @VervericaData
> 
> --
> 
> Join Flink Forward  - The Apache Flink
> Conference
> 
> Stream Processing | Event Driven | Real Time



[DISCUSS] Moving connectors from Flink to external connector repositories

2022-01-05 Thread Martijn Visser
Hi everyone,

As already mentioned in the previous discussion thread [1] I'm opening up a
parallel discussion thread on moving connectors from Flink to external
connector repositories. If you haven't read up on this discussion before, I
recommend reading that one first.

The goal with the external connector repositories is to make it easier to
develop and release connectors by not being bound to the release cycle of
Flink itself. It should result in faster connector releases, a more active
connector community and a reduced build time for Flink.

We currently have the following connectors available in Flink itself:

* Kafka -> For DataStream & Table/SQL users
* Upsert-Kafka -> For Table/SQL users
* Cassandra -> For DataStream users
* Elasticsearch -> For DataStream & Table/SQL users
* Kinesis -> For DataStream users & Table/SQL users
* RabbitMQ -> For DataStream users
* Google Cloud PubSub -> For DataStream users
* Hybrid Source -> For DataStream users
* NiFi -> For DataStream users
* Pulsar -> For DataStream users
* Twitter -> For DataStream users
* JDBC -> For DataStream & Table/SQL users
* FileSystem -> For DataStream & Table/SQL users
* HBase -> For DataStream & Table/SQL users
* DataGen -> For Table/SQL users
* Print -> For Table/SQL users
* BlackHole -> For Table/SQL users
* Hive -> For Table/SQL users

I would propose to move out all connectors except Hybrid Source,
FileSystem, DataGen, Print and BlackHole because:

* We should avoid at all costs that certain connectors are considered as
'Core' connectors. If that happens, it creates a perception that there are
first-grade/high-quality connectors because they are in 'Core' Flink and
second-grade/lesser-quality connectors because they are outside of the
Flink codebase. It directly hurts the goal, because these connectors are
still bound to the release cycle of Flink. Last but not least, it risks any
success of external connector repositories since every connector
contributor would still want to be in 'Core' Flink.
* To continue on the quality of connectors, we should aim that all
connectors are of high quality. That means that we shouldn't have a
connector that's only available for either DataStream or Table/SQL users,
but for both. It also means that (if applicable) the connector should
support all options, like bounded and unbounded scan, lookup, batch and
streaming sink capabilities. In the end the quality should depend on the
maintainers of the connector, not on where the code is maintained.
* The Hybrid Source connector is a special connector because of its
purpose.
* The FileSystem, DataGen, Print and BlackHole connectors are important for
first time Flink users/testers. If you want to experiment with Flink, you
will most likely start with a local file before moving to one of the other
sources or sinks. These 4 connectors can help with either reading/writing
local files or generating/displaying/ignoring data.
* Some of the connectors haven't been maintained in a long time (for
example, NiFi and Google Cloud PubSub). An argument could be made that we
check if we actually want to move such a connector or make the decision to
drop the connector entirely.

I'm looking forward to your thoughts!

Best regards,

Martijn Visser | Product Manager

mart...@ververica.com

[1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm




Follow us @VervericaData

--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time