Hello Gaetan! Nice to hear from you!

On Mon, Oct 14, 2019 at 10:53 AM Gaetan Semet <[email protected]> wrote:

> Hi.
>
> I am Gaetan Semet (gsemet), the main maintainer of the stable/airflow
> chart.
> I am pretty thrilled by this conversation, and would be glad to see the
> chart switch to an official image. While the chart is pretty stable and
> already used in production by many, I have limited time to maintain it, so
> I would be very happy to see it directly maintained by the community. Of
> course I will continue to help on it as much as possible.
>
> Maybe we can start a dedicated discussion only for the chart, I have a few
> questions on how to proceeded:
> - who to put as new OWNERS of this chart
>

For the governance I am happy to be added to the OWNERS list and I think
Daniel will be happy as well :).


> - do you want to take full "ownership" of it, if so, how to proceed ? How
> the governance of the chart should be managed?


I think eventually we might go into the Apache governance mode where anyone
from Apache Committers can contribute to the helm chart but then official
release of the helm chart needs to be voted by the PMC (but I will let PMC
to propose/decide on it)


> - the current implementation uses, like said, the "puckle" image that is
> quite good and stable, and many users are quite happy with it. Switching to
> an official image will require to document the change quite exaustively,
> especially if some feature get lost in the process.
>

I will make sure to review this. Once I get POC in the stage that all tests
pass (for now I still have mysql tests failing) I will review and point out
the differences between Puckel and the official image and will try to
either bring the missing parts in, or document the changes.


> - The current implementation use Celery executor, do you plan into
> switching completely to the Kubernetes Executor, or maintaining 2
> configurations in the template files? Maybe a simple value for the
> executable would be enough, I don't know, I did not tried the kube executor
> actually. To my understanding, the kube executor starts/stops each task in
> a pod, which comes with major cost for each simple tasks. Would it be
> possible to keep the current Celery as default executor and let user switch
> to the kube executor?
>
> I think we need Daniel to chime-in for the details about Helm - he is also
working on AIP-25 Knative executor
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-25+The+Knative+Executor>
so
we might likely support three executors in the improved helm chart :).


> Just some other questions on AIP-26:
> - do you plan making alpine images as well? or minimal-ubuntu, to reduce
> the size as much as possible?
>

I did some initial attempt to support alpine images - I started the work by
looking at the Astronomer images based on Alpine (
https://github.com/astronomer/astronomer/tree/master/docker/airflow/1.10.5)
but after some discussions with Daniel and Ash I abandoned it (and they
both support that).

The main problem with Alpine images is that Alpine uses musl library for
C/C++ rather than glibc. I already had a number of problems with
installing/compiling packages required by Airflow (I spent a day trying to
make it work without using "edge" packages from Alpine). I did some
research and comparison of sizes and I came to the same conclusion as here:
https://pythonspeed.com/articles/base-image-python-docker-images/. I am
using *python-buster-slim* images now - smallest python, pure debian image
(no ubuntu specific changes) and their size is rather small. Not as small
as alpine images, but taking into account the overall size of dependencies
we have for Airflow, the difference between slim-based and alpine based
images is not compelling. Quoting the pythonspeed blog: "The size benefit
for Alpine isn’t even particularly compelling: the download size of
python:3.7-slim-buster is 60MB, and python:3.7-alpine is 34MB, and their
uncompressed on-disk size is 180MB and 100MB respectively."

I've implemented other size optimisations in the PROD image - for example I
do not have NPM nor node_modules in the final production image - the
javascript is built in a separate stage and only the compiled javascript is
copied to the final image. The current size of the images are (
https://cloud.docker.com/repository/docker/potiuk/airflow) *master-python3.6
= 410 MB*, *master-python3.5 = 408 MB*, *master-python3.7 = 387 MB*, so
difference vs. alpine/slim is < 20% of the size (and it might be less as
the dependencies might be bigger on alpine). Debian is for sure also more
future-proof in case we add new dependencies - for precisely the same
reason (musl vs. glibc support).

I think overall it's not worth supporting multiple base images - as it will
add a lot of complexity to the Image/build process with rather limited
benefits.

- helm v3 is about to be released, and this may have a major impact on
> charts (especially they can be de-centralized). I do not know if this
> concern also the "stable" ones, but if so, it would make sense to host the
> chart aside of airflow's code, isn't it?
>

Again - I think Daniel will be better person to comment on that :).


>
> So, tell you how could I help.
>

I think reviews, comments in the PR and in the discussion here is the best
way to help. Also I think once we have the image more-or-less ready i will
ask for help with testing the image in various scenarios, so here I'd
appreciate your help here.


>
> Best Regards,
> Gaetan Semet
>
> > On Mon, Oct 14, 2019 at 8:42 AM Jarek Potiuk <[email protected]>
> > wrote:
> >
> > > Issue created! https://github.com/helm/charts/issues/17933 . Thanks
> > > Jonathan for feedback and bringing this up!
> > >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to