+1, great ideas

Let's make sure there's a dedicated section in the docs on how to "migrate"
from pulsar-all:3.2.0 to "build your own -all image"

Nicolò Boschi


Il giorno mer 6 mar 2024 alle ore 04:22 Matteo Merli <matteo.me...@gmail.com>
ha scritto:

> I was proposing `pulsar-functions-python`, though I'm open to any other
> name
> --
> Matteo Merli
> <matteo.me...@gmail.com>
>
>
> On Tue, Mar 5, 2024 at 6:43 PM Dave Fisher <w...@apache.org> wrote:
>
> > What would be the name of the image that contains the functions runtime?
> >
> > Best,
> > Dave
> >
> > > On Mar 5, 2024, at 6:37 PM, Lari Hotari <lhot...@apache.org> wrote:
> > >
> > > These are very welcome changes! Let's go ahead asap.
> > >
> > > -Lari
> > >
> > > On Wed, 6 Mar 2024 at 01:04, Matteo Merli <matteo.me...@gmail.com>
> > wrote:
> > >>
> > >> The docker image `pulsar-all` is a convenience image that is created
> on
> > top
> > >> of the base `pulsar` image, including all the Pulsar IO connectors as
> > well
> > >> as the tiered storage offloaders.
> > >>
> > >> The Dockerfile for `pulsar-all` can be found here:
> > >>
> >
> https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile
> > >>
> > >> The resulting image is very big:
> > >>
> > >> ```
> > >> apachepulsar/pulsar-all                   3.1.2
> > >> 3d1aa250bf6c   2 months ago        3.68GB
> > >> ```
> > >>
> > >> This poses a challenge in many ways:
> > >> 1. Our CI pipeline needs to build these images and cache them across
> > >> different stages of the pipeline
> > >> 2. It takes a lot of time for release managers to build and push these
> > >> images to Docker Hub
> > >> 3. Users using this image in production see very long download times,
> > >> something that can affect the availability of the system (eg: more
> > chances
> > >> of a 2nd broker to crash if a restart takes a very long time).
> > >> 4. It's very unlikely that one user will require all the connectors,
> > most
> > >> likely, it would use just 2-3 of them.
> > >>
> > >> The problem is that `pulsar-all` was introduced at a time when there
> > were
> > >> ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a
> 1.9
> > GB
> > >> total size.
> > >>
> > >> The proposal here is to drop this image altogether. Users will be able
> > to
> > >> construct their own targeted images in a very simple way:
> > >>
> > >> ```
> > >> FROM apachepulsar/pulsar:latest
> > >> RUN mkdir -p connectors && \
> > >>    cd connectors && \
> > >>    wget
> > >>
> >
> https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar
> > >> ```
> > >>
> > >>
> > >>
> > >> ### Pulsar Functions Python Runtime
> > >>
> > >> In order to support Python functions runtime, we have been including
> the
> > >> Pulsar base image with quite a bit of dependencies, from
> `pulsar-client`
> > >> Python SDK, to gRPC which is quite a heavy package with many
> transitive
> > >> dependencies.
> > >>
> > >> Given that the vast majority would be using the `pulsar` base image to
> > run
> > >> brokers and not python functions, it would make sense to split the
> > Python
> > >> support into a different image, like `pulsar-functions-python`, which
> > >> extends from the base image and adds all the needed Python
> dependencies.
> > >>
> > >> This way it will be very easy for users to select the appropriate
> image
> > and
> > >> we wouldn't be carrying a big amount of useless Python dependencies to
> > >> users who don't need them.
> > >>
> > >>
> > >> What are people's opinions with respect to this?
> > >>
> > >> Matteo
> > >>
> > >> --
> > >> Matteo Merli
> > >> <matteo.me...@gmail.com>
> >
> >
>

Reply via email to