If there is no other way, then I would say let's go with splitting the
modules. This is already better than keeping the Flink binaries bundled
with every Python/platform package.

Cheers,
Till

On Mon, Mar 22, 2021 at 8:28 AM Xingbo Huang <hxbks...@gmail.com> wrote:

> When we **pip install** a wheel package, it just unpacks the wheel package
> and installs its dependencies[1]. There is no way to download things from
> an external website during installation. It works differently from the
> source package where we could download something in the setup.py. This is
> explained in detail in [2]. So I'm afraid that splitting the package is the
> only solution we have if we want to reduce the package size of pyflink.
>
> [1] https://www.python.org/dev/peps/pep-0427/
> [2] https://realpython.com/python-wheels/#advantages-of-python-wheels
>
> Best,
> Xingbo
>
> Till Rohrmann <trohrm...@apache.org> 于2021年3月19日周五 下午6:32写道:
>
> > I think that we should try to reduce the size of the packages by either
> > splitting them or by having another means to retrieve the Java binaries.
> >
> > Cheers,
> > Till
> >
> > On Fri, Mar 19, 2021 at 2:58 AM Xingbo Huang <hxbks...@gmail.com> wrote:
> >
> > > Hi Till,
> > >
> > > The package size of tensorflow[1] is also very big(about 300MB+).
> > However,
> > > it does not try to solve the problem, but expands the space limit in
> PyPI
> > > frequently whenever the project space is full. We could also choose
> this
> > > option. According to our current release frequency, we probably need to
> > > apply for 15GB expansion every year. There are not too many similar
> > cases,
> > > so there is also no standard solution to refer to. But the behavior of
> > > splitting a project into multiple packages is quite common. For
> example,
> > > apache airflow will prepare a corresponding release package for each
> > > provider[2].
> > >
> > > So I think there are currently two solutions in my mind which could
> work.
> > >
> > > 1. Just keep the current solution and expand the space limit in PyPI
> > > whenever the space is full.
> > >
> > > 2. Split into two packages to reduce the wheel package size.
> > >
> > > [1] https://pypi.org/project/tensorflow/#files
> > > [2] https://pypi.org/search/?q=apache-airflow-*&o=
> > >
> > > Best,
> > > Xingbo
> > >
> > > Till Rohrmann <trohrm...@apache.org> 于2021年3月17日周三 下午9:22写道:
> > >
> > > > How do other projects solve this problem?
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Wed, Mar 17, 2021 at 3:45 AM Xingbo Huang <hxbks...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Chesnay,
> > > > >
> > > > > Yes, in most cases, we can indeed download the required jars in
> > > > `setup.py`,
> > > > > which is also the solution I originally thought of reducing the
> size
> > of
> > > > > wheel packages. However, I'm afraid that it will not work in
> > scenarios
> > > > when
> > > > > accessing the external network is not possible which is very common
> > in
> > > > the
> > > > > production cluster.
> > > > >
> > > > > Best,
> > > > > Xingbo
> > > > >
> > > > > Chesnay Schepler <ches...@apache.org> 于2021年3月16日周二 下午8:32写道:
> > > > >
> > > > > > This proposed apache-flink-libraries package would just contain
> the
> > > > > > binary, right? And effectively be unusable to the python audience
> > on
> > > > > > it's own.
> > > > > >
> > > > > > Essentially we are just abusing Pypi for shipping a java binary.
> Is
> > > > > > there no way for us to download the jars when the python package
> is
> > > > > > being installed? (e.g., in setup.py)
> > > > > >
> > > > > > On 3/16/2021 1:23 PM, Dian Fu wrote:
> > > > > > > Yes, the size of .whl file in PyFlink will also be about 3MB if
> > we
> > > > > split
> > > > > > the package. Currently the package is big because we bundled the
> > jar
> > > > > files
> > > > > > in it.
> > > > > > >
> > > > > > >> 2021年3月16日 下午8:13,Chesnay Schepler <ches...@apache.org> 写道:
> > > > > > >>
> > > > > > >> key difference being that the beam .whl files are 3mb large,
> aka
> > > 60x
> > > > > > smaller.
> > > > > > >>
> > > > > > >> On 3/16/2021 1:06 PM, Dian Fu wrote:
> > > > > > >>> Hi Chesnay,
> > > > > > >>>
> > > > > > >>> We will publish binary packages separately for:
> > > > > > >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> > > > > > >>> 2) Linux / Mac separately
> > > > > > >>>
> > > > > > >>> Besides, there is also a source package which is used when
> none
> > > of
> > > > > the
> > > > > > above binary packages is usable, e.g. for Window users.
> > > > > > >>>
> > > > > > >>> PS: publishing multiple binary packages is very common in
> > Python
> > > > > > world, e.g. Beam published 22 packages in 2.28, Pandas published
> 16
> > > > > > packages in 1.2.3 [2]. We could also publishing more packages if
> we
> > > > > > splitting the packages as the cost of adding another package will
> > be
> > > > very
> > > > > > small.
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> Dian
> > > > > > >>>
> > > > > > >>> [1] https://pypi.org/project/apache-beam/#files <
> > > > > > https://pypi.org/project/apache-beam/#files> <
> > > > > > https://pypi.org/project/apache-beam/#files <
> > > > > > https://pypi.org/project/apache-beam/#files>>
> > > > > > >>> [2] https://pypi.org/project/pandas/#files
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Hi Xintong,
> > > > > > >>>
> > > > > > >>> Yes, you are right that there is 9 packages in 1.12 as we
> added
> > > > > Python
> > > > > > 3.8 support in 1.12.
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> Dian
> > > > > > >>>
> > > > > > >>>> 2021年3月16日 下午7:45,Xintong Song <tonysong...@gmail.com> 写道:
> > > > > > >>>>
> > > > > > >>>> And it's not only uploaded to PyPI, but the ASF mirrors as
> > well.
> > > > > > >>>>
> > > > > > >>>>
> > > > >
> > https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> > > > > > >>>>
> > > > > > >>>> Thank you~
> > > > > > >>>>
> > > > > > >>>> Xintong Song
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <
> > > > tonysong...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > > >>>>
> > > > > > >>>>> Actually, I think it's 9 packages, not 7.
> > > > > > >>>>>
> > > > > > >>>>> Check here for the 1.12.2 packages.
> > > > > > >>>>> https://pypi.org/project/apache-flink/#files
> > > > > > >>>>>
> > > > > > >>>>> Thank you~
> > > > > > >>>>>
> > > > > > >>>>> Xintong Song
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <
> > > > > ches...@apache.org
> > > > > > >
> > > > > > >>>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> Am I reading this correctly that we publish 7 different
> > > > artifacts
> > > > > > just
> > > > > > >>>>>> for python?
> > > > > > >>>>>> What does the release matrix look like?
> > > > > > >>>>>>
> > > > > > >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
> > > > > > >>>>>>> Hi Xingbo,
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> Thanks a lot for bringing up this discussion. Actually
> the
> > > size
> > > > > > limit
> > > > > > >>>>>> already becomes an issue during releasing 1.11.3 and
> 1.12.1.
> > > It
> > > > > > blocks us
> > > > > > >>>>>> to publish PyFlink packages to PyPI during the release as
> > > there
> > > > is
> > > > > > no
> > > > > > >>>>>> enough space left (PS: already published the packages
> after
> > > > > > increasing the
> > > > > > >>>>>> size limit).
> > > > > > >>>>>>> Considering that the total package size are about 1.5GB
> > > (220MB
> > > > *
> > > > > > 7) for
> > > > > > >>>>>> each release, it makes sense to split the PyFlink package.
> > It
> > > > > could
> > > > > > reduce
> > > > > > >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB)
> for
> > > > each
> > > > > > release.
> > > > > > >>>>>> We don’t need to increase the size limit any more in the
> > next
> > > > few
> > > > > > years as
> > > > > > >>>>>> currently we still have about 7.5 GB space left.
> > > > > > >>>>>>> So +1 from my side.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Regards,
> > > > > > >>>>>>> Dian
> > > > > > >>>>>>>
> > > > > > >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hxbks...@gmail.com> 写道:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Hi everyone,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Since release-1.11, pyflink has introduced cython
> support
> > > and
> > > > we
> > > > > > will
> > > > > > >>>>>>>> release 7 packages (for different platforms and Python
> > > > versions)
> > > > > > to
> > > > > > >>>>>> PyPI
> > > > > > >>>>>>>> for each release and the size of each package is more
> than
> > > > 200MB
> > > > > > as we
> > > > > > >>>>>> need
> > > > > > >>>>>>>> to bundle the jar files into the package. The entire
> > project
> > > > > > space in
> > > > > > >>>>>> PyPI
> > > > > > >>>>>>>> grows very fast, and we need to apply to PyPI for more
> > > project
> > > > > > space
> > > > > > >>>>>>>> frequently. Please refer to [
> > > > > > >>>>>> https://github.com/pypa/pypi-support/issues/831]
> > > > > > >>>>>>>> for more details.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> The root cause to this problem is that we bundled the
> jar
> > > > files
> > > > > > in each
> > > > > > >>>>>>>> package. This is actually unnecessary if we could
> extract
> > > the
> > > > > jar
> > > > > > files
> > > > > > >>>>>>>> into a separate package which is dedicated to hold the
> jar
> > > > > files.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> I’d like to propose to split the pyflink package into
> two
> > > > > > packages: the
> > > > > > >>>>>>>> original apache-flink  and apache-flink-libraries (Any
> > > > > > suggestions for
> > > > > > >>>>>> the
> > > > > > >>>>>>>> name?). The package apache-flink-libraries only contains
> > jar
> > > > > > files and
> > > > > > >>>>>>>> there is only one apache-flink-libraries package for
> each
> > > > > > release. The
> > > > > > >>>>>>>> package apache-flink depends on apache-flink-libraries
> and
> > > for
> > > > > > users,
> > > > > > >>>>>> they
> > > > > > >>>>>>>> still only need to install apache-flink and there is
> > nothing
> > > > > > different
> > > > > > >>>>>> from
> > > > > > >>>>>>>> before. We still need to release multiple wheel packages
> > of
> > > > > > >>>>>> apache-flink.
> > > > > > >>>>>>>> However, the size will be very small as it doesn't
> contain
> > > the
> > > > > jar
> > > > > > >>>>>> files
> > > > > > >>>>>>>> any more.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Looking forward to your feedback.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Best,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Xingbo
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to