Hi everyone,

As already mentioned in the previous discussion thread [1] I'm opening up a
parallel discussion thread on moving connectors from Flink to external
connector repositories. If you haven't read up on this discussion before, I
recommend reading that one first.

The goal with the external connector repositories is to make it easier to
develop and release connectors by not being bound to the release cycle of
Flink itself. It should result in faster connector releases, a more active
connector community and a reduced build time for Flink.

We currently have the following connectors available in Flink itself:

* Kafka -> For DataStream & Table/SQL users
* Upsert-Kafka -> For Table/SQL users
* Cassandra -> For DataStream users
* Elasticsearch -> For DataStream & Table/SQL users
* Kinesis -> For DataStream users & Table/SQL users
* RabbitMQ -> For DataStream users
* Google Cloud PubSub -> For DataStream users
* Hybrid Source -> For DataStream users
* NiFi -> For DataStream users
* Pulsar -> For DataStream users
* Twitter -> For DataStream users
* JDBC -> For DataStream & Table/SQL users
* FileSystem -> For DataStream & Table/SQL users
* HBase -> For DataStream & Table/SQL users
* DataGen -> For Table/SQL users
* Print -> For Table/SQL users
* BlackHole -> For Table/SQL users
* Hive -> For Table/SQL users

I would propose to move out all connectors except Hybrid Source,
FileSystem, DataGen, Print and BlackHole because:

* We should avoid at all costs that certain connectors are considered as
'Core' connectors. If that happens, it creates a perception that there are
first-grade/high-quality connectors because they are in 'Core' Flink and
second-grade/lesser-quality connectors because they are outside of the
Flink codebase. It directly hurts the goal, because these connectors are
still bound to the release cycle of Flink. Last but not least, it risks any
success of external connector repositories since every connector
contributor would still want to be in 'Core' Flink.
* To continue on the quality of connectors, we should aim that all
connectors are of high quality. That means that we shouldn't have a
connector that's only available for either DataStream or Table/SQL users,
but for both. It also means that (if applicable) the connector should
support all options, like bounded and unbounded scan, lookup, batch and
streaming sink capabilities. In the end the quality should depend on the
maintainers of the connector, not on where the code is maintained.
* The Hybrid Source connector is a special connector because of its
purpose.
* The FileSystem, DataGen, Print and BlackHole connectors are important for
first time Flink users/testers. If you want to experiment with Flink, you
will most likely start with a local file before moving to one of the other
sources or sinks. These 4 connectors can help with either reading/writing
local files or generating/displaying/ignoring data.
* Some of the connectors haven't been maintained in a long time (for
example, NiFi and Google Cloud PubSub). An argument could be made that we
check if we actually want to move such a connector or make the decision to
drop the connector entirely.

I'm looking forward to your thoughts!

Best regards,

Martijn Visser | Product Manager

mart...@ververica.com

[1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm

<https://www.ververica.com/>


Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

Reply via email to