I also like the idea of SLIM images - always helpful. Howard
On Wed, May 4, 2022 at 4:53 PM Ping Zhang <[email protected]> wrote: > Hi Jarek, > > I really like the idea of having a slim airflow docker image. 500MB > uncompressed is tiny 👍 > > > Thanks, > > Ping > > > On Sun, May 1, 2022 at 8:41 AM Jarek Potiuk <[email protected]> wrote: > >> And just to clarify. Those "slim" images are not at all "toothless". You >> can actually do stuff with them :) >> >> The 4 providers that are preinstalled are there: >> >> apache-airflow-providers-ftp | File Transfer Protocol (FTP) >> https://tools.ietf.org/html/rfc114 | 2.1.2 >> apache-airflow-providers-http | Hypertext Transfer Protocol (HTTP) >> https://www.w3.org/Protocols/ | 2.1.2 >> apache-airflow-providers-imap | Internet Message Access Protocol (IMAP) >> https://tools.ietf.org/html/rfc3501 | 2.2.3 >> apache-airflow-providers-sqlite | SQLite https://www.sqlite.org/ >> | 2.1.3 >> >> We could probably further slim them down but that would limit the >> extensibility a bit and I consider 500 MB uncompressed as pretty "decent" - >> it's ~ 130-160 MB of compressed data when you pull the image. >> >> J. >> >> >> >> On Sun, May 1, 2022 at 5:26 PM Jarek Potiuk <[email protected]> wrote: >> >>> Hello everyone, >>> >>> TL;DR: I am looking for consensus on releasing "slim" versions of PROD >>> images - ones that will be way smaller and contain no providers nor >>> other extras and would be database-specific. >>> >>> Context: >>> >>> Now after we are done with some infra changes that were also released >>> in 2.3.0 I came back to the issue raised in in >>> https://github.com/apache/airflow/issues/20849 which was originally >>> about "vanilla" image for Airflow, but I renamed the idea to "slim" >>> image (following similar convention by various distro and Python >>> providers). The issue itself explains why there is a need for such >>> images. >>> >>> The idea is to have a very small "base" ("slim") image that users will >>> be able to extend - not only a "regular" (see the relation with >>> "slim" :D ?) image where we pre-install a set of providers and >>> support multiple database backends. >>> >>> The "slim" images also have the advantage that we can use >>> "no-constraints" dependencies with them - which means that in those >>> images, the dependencies are "latest" that airflow supports even if >>> some providers would limit the dependencies. >>> >>> I looked at what it would mean and really what it translates to is >>> that we would have to push many more images. >>> >>> The bad news: >>> >>> We need to push matrix of 4 * 3 = 12 new "slim" images (plus some >>> aliases for "latest") >>> * Python versions: 3.7, 3.8, 3.9, 3.10 >>> * Database: postgres, mysql, mssql >>> >>> Postgres images would be additionally multiplatform (AMD64/ARM64) and >>> for now MySQL and MsSQL would be just AMD64 (until we add support for >>> ARM for those). >>> Those are plenty of images, but this is a rather normal approach if >>> you look for a number of other images published by multiple >>> "platform-like" products. >>> >>> The good news: >>> >>> We only need to do it at release time and we already have the right >>> set of scripts and parameters to enable that. It will take a bit >>> longer, but those images are much smaller and building and pushing >>> them is WAY faster and smaller han the regular image. >>> >>> Some comparison: >>> >>> Size (uncompressed): Regular (1.1G), Slim (500MB) >>> Time to build single image: Regular(6m), Slim (up to 3m) >>> >>> Overall the release process would take some 20 mins longer if we >>> release the slim images (and I already made it a separate step so it >>> should not block "regular" release). >>> >>> The very good news: >>> >>> I've actually prepared PR: >>> https://github.com/apache/airflow/pull/23391 to add this feature >>> (including the docs), and it's a very small change. It does not change >>> any of the source code of airflow or Dockerfile, we basically need to >>> extend our "dev" script to build and push images to ... build and push >>> more images. I actually even .. prepared and pushed 2.3.0 images of >>> airflow to my private dockerhub account so that everyone can see how >>> it will look like. >>> >>> You can see it here: >>> >>> https://hub.docker.com/repository/docker/potiuk/airflow/tags?page=1&ordering=last_updated&name=2.3.0 >>> >>> I **believe** those changes don't even need PMC votes for release, and >>> this is more a procedural change than software release, so we >>> **could** release the "slim" 2.3.0 images even now - so that they are >>> available as of 2.3.0. I think even if we see that this is a welcome >>> change (despite the complexity of our dockerhub images available) it >>> could even be agreed to via lasy-consensus if we see consensus >>> forming. >>> >>> J. >>> >>
