Hi all,

With the effort of FLIP-38 [1], the Python Table API(without UDF support
for now) will be supported in the coming release-1.9.
As described in "Build PyFlink"[2], if users want to use the Python Table
API, they can manually install it using the command:
"cd flink-python && python3 setup.py sdist && pip install dist/*.tar.gz".

This is non-trivial for users and it will be better if we can follow the
Python way to publish PyFlink to PyPI
which is a repository of software for the Python programming language. Then
users can use the standard Python package
manager "pip" to install PyFlink: "pip install pyflink". So, there are some
topic need to be discussed as follows:

1. How to publish PyFlink to PyPI

1.1 Project Name
     We need to decide the project name of PyPI to use, for example,
apache-flink,  pyflink, etc.

    Regarding to the name "pyflink", it has already been registered by
@ueqt and there is already a package '1.0' released under this project
which is based on flink-libraries/flink-python.

   @ueqt has kindly agreed to give this project back to the community. And
he has requested that the released package '1.0' should not be removed as
it has already been used in their company.

    So we need to decide whether to use the name 'pyflink'?  If yes, we
need to figure out how to tackle with the package '1.0' under this project.

    From the points of my view, the "pyflink" is better for our project
name and we can keep the release of 1.0, maybe more people want to use.

1.2 PyPI account for release
    We need also decide on which account to use to publish packages to PyPI.

    There are two permissions in PyPI: owner and maintainer:

    1) The owner can upload releases, delete files, releases or the entire
project.
    2) The maintainer can also upload releases. However, they cannot delete
files, releases, or the project.

    So there are two options in my mind:

    1) Create an account such as 'pyflink' as the owner share it with all
the release managers and then release managers can publish the package to
PyPI using this account.
    2) Create an account such as 'pyflink' as owner(only PMC can manage it)
and adds the release manager's account as maintainers of the project.
Release managers publish the package to PyPI using their own account.

    As I know, PySpark takes Option 1) and Apache Beam takes Option 2).

    From the points of my view, I prefer option 2) as it's pretty safer as
it eliminate the risk of deleting old releases occasionally and at the same
time keeps the trace of who is operating.

2. How to handle Scala_2.11 and Scala_2.12

The PyFlink package bundles the jars in the package. As we know, there are
two versions of jars for each module: one for Scala 2.11 and the other for
Scala 2.12. So there will be two PyFlink packages theoretically. We need to
decide which one to publish to PyPI or both. If both packages will be
published to PyPI, we may need two projects, such as pyflink_211 and
pyflink_212 separately. Maybe more in the future such as pyflink_213.

    (BTW, I think we should bring up a discussion for dorp Scala_2.11 in
Flink 1.10 release due to 2.13 is available in early June.)

    From the points of my view, for now, we can only release the scala_2.11
version, due to scala_2.11 is our default version in Flink.

3. Legal problems of publishing to PyPI

As @Chesnay Schepler <ches...@apache.org>  pointed out in FLINK-13011[3],
publishing PyFlink to PyPI means that we will publish binaries to a
distribution channel not owned by Apache. We need to figure out if there
are legal problems. From my point of view, there are no problems as a few
Apache projects such as Spark, Beam, etc have already done it. Frankly
speaking, I am not familiar with this problem, welcome any feedback on this
if somebody is more family with this.

Great thanks to @ueqt for willing to dedicate PyPI's project name `pyflink`
to the Apache Flink community!!!
Great thanks to @Dian for the offline effort!!!

Best,
Jincheng

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API
[2]
https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink
[3] https://issues.apache.org/jira/browse/FLINK-13011

Reply via email to