+1

This can reduce the image size significantly and thus improve the
efficiency and reduce the cost.

On Tue, Mar 5, 2024 at 11:25 PM Enrico Olivelli <eolive...@gmail.com> wrote:

> +1
>
> Great idea
>
> Enrico
>
> Il Mer 6 Mar 2024, 08:23 Zixuan Liu <node...@gmail.com> ha scritto:
>
> > +1
> >
> > This is a good idea, and then we must provide a document on building the
> > own connector image and python functions runtime image.
> >
> > Thanks,
> > Zixuan
> >
> > Matteo Merli <matteo.me...@gmail.com> 于2024年3月6日周三 07:04写道:
> >
> > > The docker image `pulsar-all` is a convenience image that is created on
> > top
> > > of the base `pulsar` image, including all the Pulsar IO connectors as
> > well
> > > as the tiered storage offloaders.
> > >
> > > The Dockerfile for `pulsar-all` can be found here:
> > >
> >
> https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile
> > >
> > > The resulting image is very big:
> > >
> > > ```
> > > apachepulsar/pulsar-all                   3.1.2
> > >  3d1aa250bf6c   2 months ago        3.68GB
> > > ```
> > >
> > > This poses a challenge in many ways:
> > >  1. Our CI pipeline needs to build these images and cache them across
> > > different stages of the pipeline
> > >  2. It takes a lot of time for release managers to build and push these
> > > images to Docker Hub
> > >  3. Users using this image in production see very long download times,
> > > something that can affect the availability of the system (eg: more
> > chances
> > > of a 2nd broker to crash if a restart takes a very long time).
> > >  4. It's very unlikely that one user will require all the connectors,
> > most
> > > likely, it would use just 2-3 of them.
> > >
> > > The problem is that `pulsar-all` was introduced at a time when there
> were
> > > ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a 1.9
> > GB
> > > total size.
> > >
> > > The proposal here is to drop this image altogether. Users will be able
> to
> > > construct their own targeted images in a very simple way:
> > >
> > > ```
> > > FROM apachepulsar/pulsar:latest
> > > RUN mkdir -p connectors && \
> > >     cd connectors && \
> > >     wget
> > >
> > >
> >
> https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar
> > > ```
> > >
> > >
> > >
> > > ### Pulsar Functions Python Runtime
> > >
> > > In order to support Python functions runtime, we have been including
> the
> > > Pulsar base image with quite a bit of dependencies, from
> `pulsar-client`
> > > Python SDK, to gRPC which is quite a heavy package with many transitive
> > > dependencies.
> > >
> > > Given that the vast majority would be using the `pulsar` base image to
> > run
> > > brokers and not python functions, it would make sense to split the
> Python
> > > support into a different image, like `pulsar-functions-python`, which
> > > extends from the base image and adds all the needed Python
> dependencies.
> > >
> > > This way it will be very easy for users to select the appropriate image
> > and
> > > we wouldn't be carrying a big amount of useless Python dependencies to
> > > users who don't need them.
> > >
> > >
> > > What are people's opinions with respect to this?
> > >
> > > Matteo
> > >
> > > --
> > > Matteo Merli
> > > <matteo.me...@gmail.com>
> > >
> >
>


-- 
Best Regards,
Neng

Reply via email to