+1, great ideas Let's make sure there's a dedicated section in the docs on how to "migrate" from pulsar-all:3.2.0 to "build your own -all image"
Nicolò Boschi Il giorno mer 6 mar 2024 alle ore 04:22 Matteo Merli <matteo.me...@gmail.com> ha scritto: > I was proposing `pulsar-functions-python`, though I'm open to any other > name > -- > Matteo Merli > <matteo.me...@gmail.com> > > > On Tue, Mar 5, 2024 at 6:43 PM Dave Fisher <w...@apache.org> wrote: > > > What would be the name of the image that contains the functions runtime? > > > > Best, > > Dave > > > > > On Mar 5, 2024, at 6:37 PM, Lari Hotari <lhot...@apache.org> wrote: > > > > > > These are very welcome changes! Let's go ahead asap. > > > > > > -Lari > > > > > > On Wed, 6 Mar 2024 at 01:04, Matteo Merli <matteo.me...@gmail.com> > > wrote: > > >> > > >> The docker image `pulsar-all` is a convenience image that is created > on > > top > > >> of the base `pulsar` image, including all the Pulsar IO connectors as > > well > > >> as the tiered storage offloaders. > > >> > > >> The Dockerfile for `pulsar-all` can be found here: > > >> > > > https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile > > >> > > >> The resulting image is very big: > > >> > > >> ``` > > >> apachepulsar/pulsar-all 3.1.2 > > >> 3d1aa250bf6c 2 months ago 3.68GB > > >> ``` > > >> > > >> This poses a challenge in many ways: > > >> 1. Our CI pipeline needs to build these images and cache them across > > >> different stages of the pipeline > > >> 2. It takes a lot of time for release managers to build and push these > > >> images to Docker Hub > > >> 3. Users using this image in production see very long download times, > > >> something that can affect the availability of the system (eg: more > > chances > > >> of a 2nd broker to crash if a restart takes a very long time). > > >> 4. It's very unlikely that one user will require all the connectors, > > most > > >> likely, it would use just 2-3 of them. > > >> > > >> The problem is that `pulsar-all` was introduced at a time when there > > were > > >> ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a > 1.9 > > GB > > >> total size. > > >> > > >> The proposal here is to drop this image altogether. Users will be able > > to > > >> construct their own targeted images in a very simple way: > > >> > > >> ``` > > >> FROM apachepulsar/pulsar:latest > > >> RUN mkdir -p connectors && \ > > >> cd connectors && \ > > >> wget > > >> > > > https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar > > >> ``` > > >> > > >> > > >> > > >> ### Pulsar Functions Python Runtime > > >> > > >> In order to support Python functions runtime, we have been including > the > > >> Pulsar base image with quite a bit of dependencies, from > `pulsar-client` > > >> Python SDK, to gRPC which is quite a heavy package with many > transitive > > >> dependencies. > > >> > > >> Given that the vast majority would be using the `pulsar` base image to > > run > > >> brokers and not python functions, it would make sense to split the > > Python > > >> support into a different image, like `pulsar-functions-python`, which > > >> extends from the base image and adds all the needed Python > dependencies. > > >> > > >> This way it will be very easy for users to select the appropriate > image > > and > > >> we wouldn't be carrying a big amount of useless Python dependencies to > > >> users who don't need them. > > >> > > >> > > >> What are people's opinions with respect to this? > > >> > > >> Matteo > > >> > > >> -- > > >> Matteo Merli > > >> <matteo.me...@gmail.com> > > > > >