Using the Alpine image (PIP-324 in progress), removing Python I was able to
see ~350 MB size for the `pulsar` base image.

There could be additional space savings by removing unused JVM modules from
the image.


--
Matteo Merli
<matteo.me...@gmail.com>


On Thu, Mar 7, 2024 at 10:09 AM Girish Sharma <scrapmachi...@gmail.com>
wrote:

> +1
> We are recently struggling with building a pulsar image in house (lots of
> app sec constraints etc). a much reduced and minimal image would certainly
> help there.
>
> Any estimates on the size reduction in the base pulsar image after removal
> of python related content? Is there scope of further slim down of the base
> pulsar image by removing anything non essential in running a broker (or as
> a bookie or zk)
>
> Regards
>
> On Thu, Mar 7, 2024 at 11:19 PM Neng Lu <freen...@gmail.com> wrote:
>
> > +1
> >
> > This can reduce the image size significantly and thus improve the
> > efficiency and reduce the cost.
> >
> > On Tue, Mar 5, 2024 at 11:25 PM Enrico Olivelli <eolive...@gmail.com>
> > wrote:
> >
> > > +1
> > >
> > > Great idea
> > >
> > > Enrico
> > >
> > > Il Mer 6 Mar 2024, 08:23 Zixuan Liu <node...@gmail.com> ha scritto:
> > >
> > > > +1
> > > >
> > > > This is a good idea, and then we must provide a document on building
> > the
> > > > own connector image and python functions runtime image.
> > > >
> > > > Thanks,
> > > > Zixuan
> > > >
> > > > Matteo Merli <matteo.me...@gmail.com> 于2024年3月6日周三 07:04写道:
> > > >
> > > > > The docker image `pulsar-all` is a convenience image that is
> created
> > on
> > > > top
> > > > > of the base `pulsar` image, including all the Pulsar IO connectors
> as
> > > > well
> > > > > as the tiered storage offloaders.
> > > > >
> > > > > The Dockerfile for `pulsar-all` can be found here:
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile
> > > > >
> > > > > The resulting image is very big:
> > > > >
> > > > > ```
> > > > > apachepulsar/pulsar-all                   3.1.2
> > > > >  3d1aa250bf6c   2 months ago        3.68GB
> > > > > ```
> > > > >
> > > > > This poses a challenge in many ways:
> > > > >  1. Our CI pipeline needs to build these images and cache them
> across
> > > > > different stages of the pipeline
> > > > >  2. It takes a lot of time for release managers to build and push
> > these
> > > > > images to Docker Hub
> > > > >  3. Users using this image in production see very long download
> > times,
> > > > > something that can affect the availability of the system (eg: more
> > > > chances
> > > > > of a 2nd broker to crash if a restart takes a very long time).
> > > > >  4. It's very unlikely that one user will require all the
> connectors,
> > > > most
> > > > > likely, it would use just 2-3 of them.
> > > > >
> > > > > The problem is that `pulsar-all` was introduced at a time when
> there
> > > were
> > > > > ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a
> > 1.9
> > > > GB
> > > > > total size.
> > > > >
> > > > > The proposal here is to drop this image altogether. Users will be
> > able
> > > to
> > > > > construct their own targeted images in a very simple way:
> > > > >
> > > > > ```
> > > > > FROM apachepulsar/pulsar:latest
> > > > > RUN mkdir -p connectors && \
> > > > >     cd connectors && \
> > > > >     wget
> > > > >
> > > > >
> > > >
> > >
> >
> https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar
> > > > > ```
> > > > >
> > > > >
> > > > >
> > > > > ### Pulsar Functions Python Runtime
> > > > >
> > > > > In order to support Python functions runtime, we have been
> including
> > > the
> > > > > Pulsar base image with quite a bit of dependencies, from
> > > `pulsar-client`
> > > > > Python SDK, to gRPC which is quite a heavy package with many
> > transitive
> > > > > dependencies.
> > > > >
> > > > > Given that the vast majority would be using the `pulsar` base image
> > to
> > > > run
> > > > > brokers and not python functions, it would make sense to split the
> > > Python
> > > > > support into a different image, like `pulsar-functions-python`,
> which
> > > > > extends from the base image and adds all the needed Python
> > > dependencies.
> > > > >
> > > > > This way it will be very easy for users to select the appropriate
> > image
> > > > and
> > > > > we wouldn't be carrying a big amount of useless Python dependencies
> > to
> > > > > users who don't need them.
> > > > >
> > > > >
> > > > > What are people's opinions with respect to this?
> > > > >
> > > > > Matteo
> > > > >
> > > > > --
> > > > > Matteo Merli
> > > > > <matteo.me...@gmail.com>
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best Regards,
> > Neng
> >
>
>
> --
> Girish Sharma
>

Reply via email to