Hi Jarek, I really like the idea of having a slim airflow docker image. 500MB uncompressed is tiny 👍
Thanks, Ping On Sun, May 1, 2022 at 8:41 AM Jarek Potiuk <[email protected]> wrote: > And just to clarify. Those "slim" images are not at all "toothless". You > can actually do stuff with them :) > > The 4 providers that are preinstalled are there: > > apache-airflow-providers-ftp | File Transfer Protocol (FTP) > https://tools.ietf.org/html/rfc114 | 2.1.2 > apache-airflow-providers-http | Hypertext Transfer Protocol (HTTP) > https://www.w3.org/Protocols/ | 2.1.2 > apache-airflow-providers-imap | Internet Message Access Protocol (IMAP) > https://tools.ietf.org/html/rfc3501 | 2.2.3 > apache-airflow-providers-sqlite | SQLite https://www.sqlite.org/ > | 2.1.3 > > We could probably further slim them down but that would limit the > extensibility a bit and I consider 500 MB uncompressed as pretty "decent" - > it's ~ 130-160 MB of compressed data when you pull the image. > > J. > > > > On Sun, May 1, 2022 at 5:26 PM Jarek Potiuk <[email protected]> wrote: > >> Hello everyone, >> >> TL;DR: I am looking for consensus on releasing "slim" versions of PROD >> images - ones that will be way smaller and contain no providers nor >> other extras and would be database-specific. >> >> Context: >> >> Now after we are done with some infra changes that were also released >> in 2.3.0 I came back to the issue raised in in >> https://github.com/apache/airflow/issues/20849 which was originally >> about "vanilla" image for Airflow, but I renamed the idea to "slim" >> image (following similar convention by various distro and Python >> providers). The issue itself explains why there is a need for such >> images. >> >> The idea is to have a very small "base" ("slim") image that users will >> be able to extend - not only a "regular" (see the relation with >> "slim" :D ?) image where we pre-install a set of providers and >> support multiple database backends. >> >> The "slim" images also have the advantage that we can use >> "no-constraints" dependencies with them - which means that in those >> images, the dependencies are "latest" that airflow supports even if >> some providers would limit the dependencies. >> >> I looked at what it would mean and really what it translates to is >> that we would have to push many more images. >> >> The bad news: >> >> We need to push matrix of 4 * 3 = 12 new "slim" images (plus some >> aliases for "latest") >> * Python versions: 3.7, 3.8, 3.9, 3.10 >> * Database: postgres, mysql, mssql >> >> Postgres images would be additionally multiplatform (AMD64/ARM64) and >> for now MySQL and MsSQL would be just AMD64 (until we add support for >> ARM for those). >> Those are plenty of images, but this is a rather normal approach if >> you look for a number of other images published by multiple >> "platform-like" products. >> >> The good news: >> >> We only need to do it at release time and we already have the right >> set of scripts and parameters to enable that. It will take a bit >> longer, but those images are much smaller and building and pushing >> them is WAY faster and smaller han the regular image. >> >> Some comparison: >> >> Size (uncompressed): Regular (1.1G), Slim (500MB) >> Time to build single image: Regular(6m), Slim (up to 3m) >> >> Overall the release process would take some 20 mins longer if we >> release the slim images (and I already made it a separate step so it >> should not block "regular" release). >> >> The very good news: >> >> I've actually prepared PR: >> https://github.com/apache/airflow/pull/23391 to add this feature >> (including the docs), and it's a very small change. It does not change >> any of the source code of airflow or Dockerfile, we basically need to >> extend our "dev" script to build and push images to ... build and push >> more images. I actually even .. prepared and pushed 2.3.0 images of >> airflow to my private dockerhub account so that everyone can see how >> it will look like. >> >> You can see it here: >> >> https://hub.docker.com/repository/docker/potiuk/airflow/tags?page=1&ordering=last_updated&name=2.3.0 >> >> I **believe** those changes don't even need PMC votes for release, and >> this is more a procedural change than software release, so we >> **could** release the "slim" 2.3.0 images even now - so that they are >> available as of 2.3.0. I think even if we see that this is a welcome >> change (despite the complexity of our dockerhub images available) it >> could even be agreed to via lasy-consensus if we see consensus >> forming. >> >> J. >> >
