+1

Great idea

Enrico

Il Mer 6 Mar 2024, 08:23 Zixuan Liu <node...@gmail.com> ha scritto:

> +1
>
> This is a good idea, and then we must provide a document on building the
> own connector image and python functions runtime image.
>
> Thanks,
> Zixuan
>
> Matteo Merli <matteo.me...@gmail.com> 于2024年3月6日周三 07:04写道:
>
> > The docker image `pulsar-all` is a convenience image that is created on
> top
> > of the base `pulsar` image, including all the Pulsar IO connectors as
> well
> > as the tiered storage offloaders.
> >
> > The Dockerfile for `pulsar-all` can be found here:
> >
> https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile
> >
> > The resulting image is very big:
> >
> > ```
> > apachepulsar/pulsar-all                   3.1.2
> >  3d1aa250bf6c   2 months ago        3.68GB
> > ```
> >
> > This poses a challenge in many ways:
> >  1. Our CI pipeline needs to build these images and cache them across
> > different stages of the pipeline
> >  2. It takes a lot of time for release managers to build and push these
> > images to Docker Hub
> >  3. Users using this image in production see very long download times,
> > something that can affect the availability of the system (eg: more
> chances
> > of a 2nd broker to crash if a restart takes a very long time).
> >  4. It's very unlikely that one user will require all the connectors,
> most
> > likely, it would use just 2-3 of them.
> >
> > The problem is that `pulsar-all` was introduced at a time when there were
> > ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a 1.9
> GB
> > total size.
> >
> > The proposal here is to drop this image altogether. Users will be able to
> > construct their own targeted images in a very simple way:
> >
> > ```
> > FROM apachepulsar/pulsar:latest
> > RUN mkdir -p connectors && \
> >     cd connectors && \
> >     wget
> >
> >
> https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar
> > ```
> >
> >
> >
> > ### Pulsar Functions Python Runtime
> >
> > In order to support Python functions runtime, we have been including the
> > Pulsar base image with quite a bit of dependencies, from `pulsar-client`
> > Python SDK, to gRPC which is quite a heavy package with many transitive
> > dependencies.
> >
> > Given that the vast majority would be using the `pulsar` base image to
> run
> > brokers and not python functions, it would make sense to split the Python
> > support into a different image, like `pulsar-functions-python`, which
> > extends from the base image and adds all the needed Python dependencies.
> >
> > This way it will be very easy for users to select the appropriate image
> and
> > we wouldn't be carrying a big amount of useless Python dependencies to
> > users who don't need them.
> >
> >
> > What are people's opinions with respect to this?
> >
> > Matteo
> >
> > --
> > Matteo Merli
> > <matteo.me...@gmail.com>
> >
>

Reply via email to