Hi Ran Tao,

Thanks for opening this topic. I think there's a couple of things at hand:
1. Connectors shouldn't rely on dependencies that may or may not be
available in Flink itself, like we've seen with flink-shaded. That avoids a
tight coupling between Flink and connectors, which is exactly what we try
to avoid.
2. When following that line, that would also be applicable for things like
commons-collections and commons-io. If a connector wants to use them, it
should make sure that it bundles those artifacts itself.
3. While flink-python now fails the CI, it shouldn't actually depend on the
externalized connectors. I'm not sure what PyFlink does with it, but if
belongs to the connector code, that code should also be moved to the
individual connector repo. If it's just a generic test, we could consider
creating a generic test against released connector versions to determine
compatibility.

I'm curious about the opinions of others as well.

Best regards,

Martijn

On Mon, Jul 3, 2023 at 3:37 PM Ran Tao <chucheng...@gmail.com> wrote:

> I have an issue here that needs to upgrade commons-collections[1] (this is
> an example), but PR ci fails because flink-python test cases depend on
> flink-sql-connector-kafka, but kafka-sql-connector is a small jar, does not
> include this dependency, so flink ci cause exception[2]. Current my
> solution is [3]. But even if this PR is done, the upgrade of flink still
> requires kafka-connector released.
>
> This issue leads to deeper problems. Although the connectors have been
> externalized, many UTs of flink-python depend on these connectors, and a
> basic agreement of externalized connectors is that other dependencies
> cannot be introduced explicitly, which means the externalized connectors
> use dependencies inherited from flink. In this way, when flink main
> upgrades some dependencies, it is easy to fail when executing flink-python
> test cases,because flink no longer has this class, and the connector does
> not contain it. It's circular problem.
>
> Unless, the connector self-consistently includes all dependencies, which is
> uncontrollable.
> (only a few connectors include all jars in shade phase)
>
> In short, the current flink-python module's dependencies on the connector
> leads to an incomplete process of externalization and decoupling, which
> will lead to circular dependencies when flink upgrade or change some
> dependencies.
>
> I don't know if I made it clear. I hope to get everyone's opinions on what
> better solutions we should adopt for similar problems in the future.
>
> [1] https://issues.apache.org/jira/browse/FLINK-30274
> [2]
>
> https://user-images.githubusercontent.com/11287509/250120404-d12b60f4-7ff3-457e-a2c4-8cd415bb5ca2.png
>
>
> https://user-images.githubusercontent.com/11287509/250120522-6b096a4f-83f0-4287-b7ad-d46b9371de4c.png
> [3] https://github.com/apache/flink-connector-kafka/pull/38
>
> Best Regards,
> Ran Tao
> https://github.com/chucheng92
>

Reply via email to