+1

Most users don't need all built-in connectors, it's too bloated.

Best Regards,
Ran Gao

On 2024/03/05 23:02:37 Matteo Merli wrote:
> The docker image `pulsar-all` is a convenience image that is created on top
> of the base `pulsar` image, including all the Pulsar IO connectors as well
> as the tiered storage offloaders.
> 
> The Dockerfile for `pulsar-all` can be found here:
> https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile
> 
> The resulting image is very big:
> 
> ```
> apachepulsar/pulsar-all                   3.1.2
>  3d1aa250bf6c   2 months ago        3.68GB
> ```
> 
> This poses a challenge in many ways:
>  1. Our CI pipeline needs to build these images and cache them across
> different stages of the pipeline
>  2. It takes a lot of time for release managers to build and push these
> images to Docker Hub
>  3. Users using this image in production see very long download times,
> something that can affect the availability of the system (eg: more chances
> of a 2nd broker to crash if a restart takes a very long time).
>  4. It's very unlikely that one user will require all the connectors, most
> likely, it would use just 2-3 of them.
> 
> The problem is that `pulsar-all` was introduced at a time when there were
> ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a 1.9 GB
> total size.
> 
> The proposal here is to drop this image altogether. Users will be able to
> construct their own targeted images in a very simple way:
> 
> ```
> FROM apachepulsar/pulsar:latest
> RUN mkdir -p connectors && \
>     cd connectors && \
>     wget
> https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar
> ```
> 
> 
> 
> ### Pulsar Functions Python Runtime
> 
> In order to support Python functions runtime, we have been including the
> Pulsar base image with quite a bit of dependencies, from `pulsar-client`
> Python SDK, to gRPC which is quite a heavy package with many transitive
> dependencies.
> 
> Given that the vast majority would be using the `pulsar` base image to run
> brokers and not python functions, it would make sense to split the Python
> support into a different image, like `pulsar-functions-python`, which
> extends from the base image and adds all the needed Python dependencies.
> 
> This way it will be very easy for users to select the appropriate image and
> we wouldn't be carrying a big amount of useless Python dependencies to
> users who don't need them.
> 
> 
> What are people's opinions with respect to this?
> 
> Matteo
> 
> --
> Matteo Merli
> <matteo.me...@gmail.com>
> 

Reply via email to