The docker image `pulsar-all` is a convenience image that is created on top
of the base `pulsar` image, including all the Pulsar IO connectors as well
as the tiered storage offloaders.

The Dockerfile for `pulsar-all` can be found here:
https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile

The resulting image is very big:

```
apachepulsar/pulsar-all                   3.1.2
 3d1aa250bf6c   2 months ago        3.68GB
```

This poses a challenge in many ways:
 1. Our CI pipeline needs to build these images and cache them across
different stages of the pipeline
 2. It takes a lot of time for release managers to build and push these
images to Docker Hub
 3. Users using this image in production see very long download times,
something that can affect the availability of the system (eg: more chances
of a 2nd broker to crash if a restart takes a very long time).
 4. It's very unlikely that one user will require all the connectors, most
likely, it would use just 2-3 of them.

The problem is that `pulsar-all` was introduced at a time when there were
~3 Pulsar IO connectors. Right now we do have 35 connectors, with a 1.9 GB
total size.

The proposal here is to drop this image altogether. Users will be able to
construct their own targeted images in a very simple way:

```
FROM apachepulsar/pulsar:latest
RUN mkdir -p connectors && \
    cd connectors && \
    wget
https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar
```



### Pulsar Functions Python Runtime

In order to support Python functions runtime, we have been including the
Pulsar base image with quite a bit of dependencies, from `pulsar-client`
Python SDK, to gRPC which is quite a heavy package with many transitive
dependencies.

Given that the vast majority would be using the `pulsar` base image to run
brokers and not python functions, it would make sense to split the Python
support into a different image, like `pulsar-functions-python`, which
extends from the base image and adds all the needed Python dependencies.

This way it will be very easy for users to select the appropriate image and
we wouldn't be carrying a big amount of useless Python dependencies to
users who don't need them.


What are people's opinions with respect to this?

Matteo

--
Matteo Merli
<matteo.me...@gmail.com>

Reply via email to