If there is no other way, then I would say let's go with splitting the modules. This is already better than keeping the Flink binaries bundled with every Python/platform package.
Cheers, Till On Mon, Mar 22, 2021 at 8:28 AM Xingbo Huang <hxbks...@gmail.com> wrote: > When we **pip install** a wheel package, it just unpacks the wheel package > and installs its dependencies[1]. There is no way to download things from > an external website during installation. It works differently from the > source package where we could download something in the setup.py. This is > explained in detail in [2]. So I'm afraid that splitting the package is the > only solution we have if we want to reduce the package size of pyflink. > > [1] https://www.python.org/dev/peps/pep-0427/ > [2] https://realpython.com/python-wheels/#advantages-of-python-wheels > > Best, > Xingbo > > Till Rohrmann <trohrm...@apache.org> 于2021年3月19日周五 下午6:32写道: > > > I think that we should try to reduce the size of the packages by either > > splitting them or by having another means to retrieve the Java binaries. > > > > Cheers, > > Till > > > > On Fri, Mar 19, 2021 at 2:58 AM Xingbo Huang <hxbks...@gmail.com> wrote: > > > > > Hi Till, > > > > > > The package size of tensorflow[1] is also very big(about 300MB+). > > However, > > > it does not try to solve the problem, but expands the space limit in > PyPI > > > frequently whenever the project space is full. We could also choose > this > > > option. According to our current release frequency, we probably need to > > > apply for 15GB expansion every year. There are not too many similar > > cases, > > > so there is also no standard solution to refer to. But the behavior of > > > splitting a project into multiple packages is quite common. For > example, > > > apache airflow will prepare a corresponding release package for each > > > provider[2]. > > > > > > So I think there are currently two solutions in my mind which could > work. > > > > > > 1. Just keep the current solution and expand the space limit in PyPI > > > whenever the space is full. > > > > > > 2. Split into two packages to reduce the wheel package size. > > > > > > [1] https://pypi.org/project/tensorflow/#files > > > [2] https://pypi.org/search/?q=apache-airflow-*&o= > > > > > > Best, > > > Xingbo > > > > > > Till Rohrmann <trohrm...@apache.org> 于2021年3月17日周三 下午9:22写道: > > > > > > > How do other projects solve this problem? > > > > > > > > Cheers, > > > > Till > > > > > > > > On Wed, Mar 17, 2021 at 3:45 AM Xingbo Huang <hxbks...@gmail.com> > > wrote: > > > > > > > > > Hi Chesnay, > > > > > > > > > > Yes, in most cases, we can indeed download the required jars in > > > > `setup.py`, > > > > > which is also the solution I originally thought of reducing the > size > > of > > > > > wheel packages. However, I'm afraid that it will not work in > > scenarios > > > > when > > > > > accessing the external network is not possible which is very common > > in > > > > the > > > > > production cluster. > > > > > > > > > > Best, > > > > > Xingbo > > > > > > > > > > Chesnay Schepler <ches...@apache.org> 于2021年3月16日周二 下午8:32写道: > > > > > > > > > > > This proposed apache-flink-libraries package would just contain > the > > > > > > binary, right? And effectively be unusable to the python audience > > on > > > > > > it's own. > > > > > > > > > > > > Essentially we are just abusing Pypi for shipping a java binary. > Is > > > > > > there no way for us to download the jars when the python package > is > > > > > > being installed? (e.g., in setup.py) > > > > > > > > > > > > On 3/16/2021 1:23 PM, Dian Fu wrote: > > > > > > > Yes, the size of .whl file in PyFlink will also be about 3MB if > > we > > > > > split > > > > > > the package. Currently the package is big because we bundled the > > jar > > > > > files > > > > > > in it. > > > > > > > > > > > > > >> 2021年3月16日 下午8:13,Chesnay Schepler <ches...@apache.org> 写道: > > > > > > >> > > > > > > >> key difference being that the beam .whl files are 3mb large, > aka > > > 60x > > > > > > smaller. > > > > > > >> > > > > > > >> On 3/16/2021 1:06 PM, Dian Fu wrote: > > > > > > >>> Hi Chesnay, > > > > > > >>> > > > > > > >>> We will publish binary packages separately for: > > > > > > >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately > > > > > > >>> 2) Linux / Mac separately > > > > > > >>> > > > > > > >>> Besides, there is also a source package which is used when > none > > > of > > > > > the > > > > > > above binary packages is usable, e.g. for Window users. > > > > > > >>> > > > > > > >>> PS: publishing multiple binary packages is very common in > > Python > > > > > > world, e.g. Beam published 22 packages in 2.28, Pandas published > 16 > > > > > > packages in 1.2.3 [2]. We could also publishing more packages if > we > > > > > > splitting the packages as the cost of adding another package will > > be > > > > very > > > > > > small. > > > > > > >>> > > > > > > >>> Regards, > > > > > > >>> Dian > > > > > > >>> > > > > > > >>> [1] https://pypi.org/project/apache-beam/#files < > > > > > > https://pypi.org/project/apache-beam/#files> < > > > > > > https://pypi.org/project/apache-beam/#files < > > > > > > https://pypi.org/project/apache-beam/#files>> > > > > > > >>> [2] https://pypi.org/project/pandas/#files > > > > > > >>> > > > > > > >>> > > > > > > >>> Hi Xintong, > > > > > > >>> > > > > > > >>> Yes, you are right that there is 9 packages in 1.12 as we > added > > > > > Python > > > > > > 3.8 support in 1.12. > > > > > > >>> > > > > > > >>> Regards, > > > > > > >>> Dian > > > > > > >>> > > > > > > >>>> 2021年3月16日 下午7:45,Xintong Song <tonysong...@gmail.com> 写道: > > > > > > >>>> > > > > > > >>>> And it's not only uploaded to PyPI, but the ASF mirrors as > > well. > > > > > > >>>> > > > > > > >>>> > > > > > > > https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/ > > > > > > >>>> > > > > > > >>>> Thank you~ > > > > > > >>>> > > > > > > >>>> Xintong Song > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song < > > > > tonysong...@gmail.com > > > > > > > > > > > > wrote: > > > > > > >>>> > > > > > > >>>>> Actually, I think it's 9 packages, not 7. > > > > > > >>>>> > > > > > > >>>>> Check here for the 1.12.2 packages. > > > > > > >>>>> https://pypi.org/project/apache-flink/#files > > > > > > >>>>> > > > > > > >>>>> Thank you~ > > > > > > >>>>> > > > > > > >>>>> Xintong Song > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler < > > > > > ches...@apache.org > > > > > > > > > > > > > >>>>> wrote: > > > > > > >>>>> > > > > > > >>>>>> Am I reading this correctly that we publish 7 different > > > > artifacts > > > > > > just > > > > > > >>>>>> for python? > > > > > > >>>>>> What does the release matrix look like? > > > > > > >>>>>> > > > > > > >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote: > > > > > > >>>>>>> Hi Xingbo, > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> Thanks a lot for bringing up this discussion. Actually > the > > > size > > > > > > limit > > > > > > >>>>>> already becomes an issue during releasing 1.11.3 and > 1.12.1. > > > It > > > > > > blocks us > > > > > > >>>>>> to publish PyFlink packages to PyPI during the release as > > > there > > > > is > > > > > > no > > > > > > >>>>>> enough space left (PS: already published the packages > after > > > > > > increasing the > > > > > > >>>>>> size limit). > > > > > > >>>>>>> Considering that the total package size are about 1.5GB > > > (220MB > > > > * > > > > > > 7) for > > > > > > >>>>>> each release, it makes sense to split the PyFlink package. > > It > > > > > could > > > > > > reduce > > > > > > >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB) > for > > > > each > > > > > > release. > > > > > > >>>>>> We don’t need to increase the size limit any more in the > > next > > > > few > > > > > > years as > > > > > > >>>>>> currently we still have about 7.5 GB space left. > > > > > > >>>>>>> So +1 from my side. > > > > > > >>>>>>> > > > > > > >>>>>>> Regards, > > > > > > >>>>>>> Dian > > > > > > >>>>>>> > > > > > > >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hxbks...@gmail.com> 写道: > > > > > > >>>>>>>> > > > > > > >>>>>>>> Hi everyone, > > > > > > >>>>>>>> > > > > > > >>>>>>>> Since release-1.11, pyflink has introduced cython > support > > > and > > > > we > > > > > > will > > > > > > >>>>>>>> release 7 packages (for different platforms and Python > > > > versions) > > > > > > to > > > > > > >>>>>> PyPI > > > > > > >>>>>>>> for each release and the size of each package is more > than > > > > 200MB > > > > > > as we > > > > > > >>>>>> need > > > > > > >>>>>>>> to bundle the jar files into the package. The entire > > project > > > > > > space in > > > > > > >>>>>> PyPI > > > > > > >>>>>>>> grows very fast, and we need to apply to PyPI for more > > > project > > > > > > space > > > > > > >>>>>>>> frequently. Please refer to [ > > > > > > >>>>>> https://github.com/pypa/pypi-support/issues/831] > > > > > > >>>>>>>> for more details. > > > > > > >>>>>>>> > > > > > > >>>>>>>> The root cause to this problem is that we bundled the > jar > > > > files > > > > > > in each > > > > > > >>>>>>>> package. This is actually unnecessary if we could > extract > > > the > > > > > jar > > > > > > files > > > > > > >>>>>>>> into a separate package which is dedicated to hold the > jar > > > > > files. > > > > > > >>>>>>>> > > > > > > >>>>>>>> I’d like to propose to split the pyflink package into > two > > > > > > packages: the > > > > > > >>>>>>>> original apache-flink and apache-flink-libraries (Any > > > > > > suggestions for > > > > > > >>>>>> the > > > > > > >>>>>>>> name?). The package apache-flink-libraries only contains > > jar > > > > > > files and > > > > > > >>>>>>>> there is only one apache-flink-libraries package for > each > > > > > > release. The > > > > > > >>>>>>>> package apache-flink depends on apache-flink-libraries > and > > > for > > > > > > users, > > > > > > >>>>>> they > > > > > > >>>>>>>> still only need to install apache-flink and there is > > nothing > > > > > > different > > > > > > >>>>>> from > > > > > > >>>>>>>> before. We still need to release multiple wheel packages > > of > > > > > > >>>>>> apache-flink. > > > > > > >>>>>>>> However, the size will be very small as it doesn't > contain > > > the > > > > > jar > > > > > > >>>>>> files > > > > > > >>>>>>>> any more. > > > > > > >>>>>>>> > > > > > > >>>>>>>> Looking forward to your feedback. > > > > > > >>>>>>>> > > > > > > >>>>>>>> Best, > > > > > > >>>>>>>> > > > > > > >>>>>>>> Xingbo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >