Hi Peter,

Thanks a lot for taking care of this. Appreciate it!

Re flink-sql-client-1.18.0.jar: The original motivation of including
it in the PyFlink package is that we want to make sure that Python
users get most things they need after piping install PyFlink. So most
artifacts of a normal Flink distribution are included in the PyFlink
package. Actually there are several jars under the opt directory of a
Flink distribution, only flink-sql-client-1.18.0.jar is chosen because
it seems the most useful one among them for users. The reason
flink-sql-client-1.18.0.jar is not added to classpath is because this
JAR is usually used by the sql-client.sh which is located under the
bin directory. I'm fine to remove it (since this is not a must-to-have
dependency for Python users) if we think it's not necessary for most
users (users could manually download it if needed). In this case, we
may also need to remove some shell scripts under the bin directory,
e.g. sql-client.sh, etc since it may not work any more.

Re including connector jars under opt directory: Big +1 to this if
this is feasible. One problem I can think of is that currently the
connector repositories are externalized and depend on the Flink
repository and so when releasing Flink & PyFlink, the connectors are
still not released and so may not be possible to be included into the
PyFlink package.

Re marking lint-python.sh and install_command.sh as public, I'm +1 to
Gabor's point. The original scripts are not designed to be used by
external projects and so I guess we may need to refactor them a bit to
make them more general. @Experimental sounds reasonable to me.

Regards,
Dian

On Thu, Nov 30, 2023 at 12:27 AM Gabor Somogyi
<gabor.g.somo...@gmail.com> wrote:
>
> Hi Peter,
>
> Thanks for picking this up! Please find my thoughts inline.
>
> BR,
> G
>
>
> On Mon, Nov 27, 2023 at 4:05 PM Péter Váry <peter.vary.apa...@gmail.com>
> wrote:
>
> > Hi Team,
> >
> > During some, mostly unrelated work we come to realize that during the
> > externalization of the connector's python modules and the related tests are
> > not moved to the respective connectors repository.
> > We created the jiras [1] to create the infra, and to move the python code
> > and the tests to the respective repos.
> >
> > When working on this code, I have found several oddities, which I would
> > like to hear the community's opinion on:
>
> - Does anyone know what the
> > site-packages/pyflink/opt/flink-sql-client-1.18.0.jar supposed to do? We
> > specifically put it there [2], but then we ignore it when we check the
> > directory of jars [3]. If there is no objection, I would remove it.
> >
> +1 on removing that unless there are objections. As I see  the jar is not
> included in the classpath so not used.
>
> > - I would like to use the `opt` directory, to contain optional jars created
> > by the connectors, like flink-sql-connector-kafka-3.1.x.jar.
> >
> +1 on that. The other options would be:
>  * plugins -> Kafka connector is not used as plugin w/ separate classloader
> so this solution is not preferred
>  * lib -> here we store our core jars which are part of the main Flink repo so
> this solution is not preferred
> All in all I'm fine to use opt.
>
> > Also, the lint-python.sh [4], and install_command.sh [5] provides the base
> > of the testing infra. Would there be any objections to mark these as a
> > public apis for the connectors?
>
> I fully agree that we should avoid code duplications and re-using the
> existing code parts but making them API
> is not what I can imagine. The reason behind is simple. That would just
> block development in mid/long term in Flink main repo.
>  * lint-python.sh is far from a stable version from my understanding + that
> way Flink will contain the superset of maybe 10+ connector needs
>  * install_command.sh contains generic and Flink specific parts. When we
> extract the Flink specific parts we can declare it logically API
> What I can imagine is to make the mentioned scripts more configurable and
> make them @Experimental.
>
> When a development team of a connector is not happy to use it as-is then a
> deep-copy and/or rewrite is still an option.
>
> > Thanks,
> > Peter
> >
> > [1] - https://issues.apache.org/jira/browse/FLINK-33528
> > [2] -
> >
> > https://github.com/apache/flink/blob/2da9a9639216b8c48850ee714065f090a80dcd65/flink-python/apache-flink-libraries/setup.py#L129-L130
> > [3] -
> >
> > https://github.com/apache/flink/blob/2da9a9639216b8c48850ee714065f090a80dcd65/flink-python/pyflink/pyflink_gateway_server.py#L183-L190
> > [4] -
> > https://github.com/apache/flink/blob/master/flink-python/dev/lint-python.sh
> > [5] -
> >
> > https://github.com/apache/flink/blob/master/flink-python/dev/install_command.sh
> >

Reply via email to