I was proposing `pulsar-functions-python`, though I'm open to any other
name
--
Matteo Merli
<matteo.me...@gmail.com>


On Tue, Mar 5, 2024 at 6:43 PM Dave Fisher <w...@apache.org> wrote:

> What would be the name of the image that contains the functions runtime?
>
> Best,
> Dave
>
> > On Mar 5, 2024, at 6:37 PM, Lari Hotari <lhot...@apache.org> wrote:
> >
> > These are very welcome changes! Let's go ahead asap.
> >
> > -Lari
> >
> > On Wed, 6 Mar 2024 at 01:04, Matteo Merli <matteo.me...@gmail.com>
> wrote:
> >>
> >> The docker image `pulsar-all` is a convenience image that is created on
> top
> >> of the base `pulsar` image, including all the Pulsar IO connectors as
> well
> >> as the tiered storage offloaders.
> >>
> >> The Dockerfile for `pulsar-all` can be found here:
> >>
> https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile
> >>
> >> The resulting image is very big:
> >>
> >> ```
> >> apachepulsar/pulsar-all                   3.1.2
> >> 3d1aa250bf6c   2 months ago        3.68GB
> >> ```
> >>
> >> This poses a challenge in many ways:
> >> 1. Our CI pipeline needs to build these images and cache them across
> >> different stages of the pipeline
> >> 2. It takes a lot of time for release managers to build and push these
> >> images to Docker Hub
> >> 3. Users using this image in production see very long download times,
> >> something that can affect the availability of the system (eg: more
> chances
> >> of a 2nd broker to crash if a restart takes a very long time).
> >> 4. It's very unlikely that one user will require all the connectors,
> most
> >> likely, it would use just 2-3 of them.
> >>
> >> The problem is that `pulsar-all` was introduced at a time when there
> were
> >> ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a 1.9
> GB
> >> total size.
> >>
> >> The proposal here is to drop this image altogether. Users will be able
> to
> >> construct their own targeted images in a very simple way:
> >>
> >> ```
> >> FROM apachepulsar/pulsar:latest
> >> RUN mkdir -p connectors && \
> >>    cd connectors && \
> >>    wget
> >>
> https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar
> >> ```
> >>
> >>
> >>
> >> ### Pulsar Functions Python Runtime
> >>
> >> In order to support Python functions runtime, we have been including the
> >> Pulsar base image with quite a bit of dependencies, from `pulsar-client`
> >> Python SDK, to gRPC which is quite a heavy package with many transitive
> >> dependencies.
> >>
> >> Given that the vast majority would be using the `pulsar` base image to
> run
> >> brokers and not python functions, it would make sense to split the
> Python
> >> support into a different image, like `pulsar-functions-python`, which
> >> extends from the base image and adds all the needed Python dependencies.
> >>
> >> This way it will be very easy for users to select the appropriate image
> and
> >> we wouldn't be carrying a big amount of useless Python dependencies to
> >> users who don't need them.
> >>
> >>
> >> What are people's opinions with respect to this?
> >>
> >> Matteo
> >>
> >> --
> >> Matteo Merli
> >> <matteo.me...@gmail.com>
>
>

Reply via email to