I've updated the calculations after removing some artifacts and rebulding the images from scratch. Here are the updated conclusions:
- The multi-layered image is only slightly bigger than the mono-layered one (around *2% more *in total ) - download time is also slightly longer by 1 s (33.7 vs 32.7s) which is *3% longer.* - Downloading the image regularly by the users is way better in case of multi-layered image - for simulated user, downloading airflow image twice a week it is: *4950 MB* (multi-layered) vs. *13546 MB* (mono-layered) downloads over the course of 8 weeks. Yielding *64% less data* to download. - Multi-layered image seems to be much better for users regularly downloading the image. On Wed, Jan 16, 2019 at 10:59 PM Jarek Potiuk <[email protected]> wrote: > Hello Everyone, > > Following the discussion we had on Mono-layered vs. Multi-layered official > image for Airflow here https://github.com/apache/airflow/pull/4483, I > prepared a proof-of-concept PR of multi-layered image (based on the > mono-layered one) and I performed calculations and reached some conclusions > in this proposal (I wanted to have some hard numbers to back the statement > that multi-layered Docker file is better) : > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-10+Multi-layered+official+Airflow+image > > The conclusions I reached: > > - The multi-layered image is even slightly smaller than the > mono-layered one - so multi-layered image is even better when you download > it once > - Downloading the image regularly by the users is way better in case > of multi-layered image - for simulated user, downloading airflow image > twice a week it is: 5.7 GB (multi-layered) vs. 16.15 GB (mono-layered) > downloads over the course of 8 weeks.\ > - Multi-layered image is better choice. > > > I based those calculations on the PR I prepared: > https://github.com/apache/airflow/pull/4543 where I implemented rather > nice multi-layered Dockerfile that can be easily maintained. > > It's based on my experience with Airflow Breeze > <https://github.com/PolideaInternal/airflow-breeze> - the GCP Development > environment we used to develop 30+ GCP based operators recently. > > I hope we can reach the conclusion as the community that multi-layered is > better and that we can go in this direction :). I am happy to iterate on my > PR to make it even better. > > J. > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > E: [email protected] > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> E: [email protected]
