Using the Alpine image (PIP-324 in progress), removing Python I was able to see ~350 MB size for the `pulsar` base image.
There could be additional space savings by removing unused JVM modules from the image. -- Matteo Merli <matteo.me...@gmail.com> On Thu, Mar 7, 2024 at 10:09 AM Girish Sharma <scrapmachi...@gmail.com> wrote: > +1 > We are recently struggling with building a pulsar image in house (lots of > app sec constraints etc). a much reduced and minimal image would certainly > help there. > > Any estimates on the size reduction in the base pulsar image after removal > of python related content? Is there scope of further slim down of the base > pulsar image by removing anything non essential in running a broker (or as > a bookie or zk) > > Regards > > On Thu, Mar 7, 2024 at 11:19 PM Neng Lu <freen...@gmail.com> wrote: > > > +1 > > > > This can reduce the image size significantly and thus improve the > > efficiency and reduce the cost. > > > > On Tue, Mar 5, 2024 at 11:25 PM Enrico Olivelli <eolive...@gmail.com> > > wrote: > > > > > +1 > > > > > > Great idea > > > > > > Enrico > > > > > > Il Mer 6 Mar 2024, 08:23 Zixuan Liu <node...@gmail.com> ha scritto: > > > > > > > +1 > > > > > > > > This is a good idea, and then we must provide a document on building > > the > > > > own connector image and python functions runtime image. > > > > > > > > Thanks, > > > > Zixuan > > > > > > > > Matteo Merli <matteo.me...@gmail.com> 于2024年3月6日周三 07:04写道: > > > > > > > > > The docker image `pulsar-all` is a convenience image that is > created > > on > > > > top > > > > > of the base `pulsar` image, including all the Pulsar IO connectors > as > > > > well > > > > > as the tiered storage offloaders. > > > > > > > > > > The Dockerfile for `pulsar-all` can be found here: > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile > > > > > > > > > > The resulting image is very big: > > > > > > > > > > ``` > > > > > apachepulsar/pulsar-all 3.1.2 > > > > > 3d1aa250bf6c 2 months ago 3.68GB > > > > > ``` > > > > > > > > > > This poses a challenge in many ways: > > > > > 1. Our CI pipeline needs to build these images and cache them > across > > > > > different stages of the pipeline > > > > > 2. It takes a lot of time for release managers to build and push > > these > > > > > images to Docker Hub > > > > > 3. Users using this image in production see very long download > > times, > > > > > something that can affect the availability of the system (eg: more > > > > chances > > > > > of a 2nd broker to crash if a restart takes a very long time). > > > > > 4. It's very unlikely that one user will require all the > connectors, > > > > most > > > > > likely, it would use just 2-3 of them. > > > > > > > > > > The problem is that `pulsar-all` was introduced at a time when > there > > > were > > > > > ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a > > 1.9 > > > > GB > > > > > total size. > > > > > > > > > > The proposal here is to drop this image altogether. Users will be > > able > > > to > > > > > construct their own targeted images in a very simple way: > > > > > > > > > > ``` > > > > > FROM apachepulsar/pulsar:latest > > > > > RUN mkdir -p connectors && \ > > > > > cd connectors && \ > > > > > wget > > > > > > > > > > > > > > > > > > > > https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar > > > > > ``` > > > > > > > > > > > > > > > > > > > > ### Pulsar Functions Python Runtime > > > > > > > > > > In order to support Python functions runtime, we have been > including > > > the > > > > > Pulsar base image with quite a bit of dependencies, from > > > `pulsar-client` > > > > > Python SDK, to gRPC which is quite a heavy package with many > > transitive > > > > > dependencies. > > > > > > > > > > Given that the vast majority would be using the `pulsar` base image > > to > > > > run > > > > > brokers and not python functions, it would make sense to split the > > > Python > > > > > support into a different image, like `pulsar-functions-python`, > which > > > > > extends from the base image and adds all the needed Python > > > dependencies. > > > > > > > > > > This way it will be very easy for users to select the appropriate > > image > > > > and > > > > > we wouldn't be carrying a big amount of useless Python dependencies > > to > > > > > users who don't need them. > > > > > > > > > > > > > > > What are people's opinions with respect to this? > > > > > > > > > > Matteo > > > > > > > > > > -- > > > > > Matteo Merli > > > > > <matteo.me...@gmail.com> > > > > > > > > > > > > > > > > > > -- > > Best Regards, > > Neng > > > > > -- > Girish Sharma >