I had a discussion with Gerardo yesterday night and I realized that it's
not as obvious for everyone how the whole image building works now and how
it is supposed to work with the multi-layerd images.

I think having some pictures might work best so I draw quickly an
architecture and "life of an image" diagrams. The images and editable
diagrams are now in AIP-10
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-10+Multi-layered+and+multi-stage+official+Airflow+image>.
I hope it will help with grasping the concept.

J.

Principal Software Engineer
Phone: +48660796129

wt., 19 mar 2019, 00:00 użytkownik Jarek Potiuk <[email protected]>
napisał:

> After some initial discussion and suggestion from Daniel, I split the
> change into three separate PRs which can be reviewed and merged separately:
>
>
>    - AIRFLOW-4115 JIRA
>    <https://issues.apache.org/jira/browse/AIRFLOW-4115>, PR
>    <https://github.com/apache/airflow/pull/4936> - Docker file for Main
>    airflow image is multi-staging and has multiple layers
>
> followed by
>
>    - AIRFLOW-4116 JIRA
>    <https://issues.apache.org/jira/browse/AIRFLOW-4116>, PR
>    <https://github.com/apache/airflow/pull/4937> - Support for Main/CI
>    images in single Dockerfile
>
> followed by
>
>    - AIRFLOW-4117 JIRA
>    <https://issues.apache.org/jira/browse/AIRFLOW-4117>, PR
>    <https://github.com/apache/airflow/pull/4938>- Travis CI uses
>    multi-stage Docker image to run tests
>
>
> J.
>
> On Mon, Mar 18, 2019 at 2:23 AM Jarek Potiuk <[email protected]>
> wrote:
>
>> Hello everyone,
>>
>> I believe I am ready now to involve more of the community people in the
>> multi-layered Docker AIP-10 that I am working on for some time (with
>> comments and encouragement from Ash and Fokko as explained in the AIP
>> thread).
>>
>> Any comments, questions, critique, improvement proposals, or even help :)
>> is more than welcome.
>>
>> The work is still WIP: https://github.com/apache/airflow/pull/4543
>>
>> The AIP Confluence page (fairly detailed already) is in
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-10+Multi-layered+and+multi-stage+official+Airflow+image
>> - I think it is the best place for the discussion (as Bas suggested in the
>> AIP thread)
>>
>> I am still working on making the tests on Travis green, but I am on a
>> good path. I'd appreciate any help with it. Especially with the Kubernetes
>> tests which will likely need some small fixes in the environment or maybe
>> even switching to minikube's Docker image in docker-compose.
>>
>> What works now (I think it addresses quite a lot of the concerns Fokko
>> mentioned):
>>
>>    - Tox is removed and replaced with pure-docker execution of tests
>>    (yay!)
>>    - The same Dockerfile is used for both "slim" Airflow image and
>>    Airflow CI image used for tests. Once we merge it, we will be able to
>>    deprecate incubator-airflow-ci image.
>>    - Part of the PR is also related to "Simplified development
>>    environment - AIP-7" (aka Airflow Breeze). I have a nice working Breeze
>>    environment as part of the change now - I will split it off eventually to
>>    separate discussion/PR but for now it makes it easier for me to run tests
>>    so I keep it in.
>>    - The Multi-staging/multi-layered Dockerfile should already improve
>>    CI build "purity". The way "layers" work now is that PIP dependencies are
>>    effectively frozen in-between setup.py changes. Only when setup.py 
>> changes,
>>    the corresponding layers are rebuilt and dependencies re-installed. That
>>    should provide 'out-of-the-box" better stability of CI builds even before
>>    we solve dependency problem in more "systematic" way (as Fokko mentioned 
>> we
>>    should have separate AIP for that). I am happy to discuss more - either 
>> now
>>    or in the future AIP. It's quite close to my interest to fix this
>>    eventually as well.
>>
>> I went through several iterations and what I came up with is already
>> quite simple and straightforward comparing to some initial approaches I
>> took.
>>
>> I added quite detailed description and motivation, proposed design and
>> even measured the impact of layering on build times (All in AIP-10
>> Confluence page).
>>
>> I will continue fixing tests and rebasing the changes for some time (even
>> few weeks if needed) to test how it behaves with real changes coming
>> regularly.
>>
>> For now it's done in the way that I have separate DockerHub build and
>> Travis CI instance where I will keep on running the tests (automatically):
>>
>>    - DockerHub:
>>    https://cloud.docker.com/repository/docker/potiuk/airflow/timeline
>>    - Travis CI: https://travis-ci.org/potiuk/airflow/builds
>>
>> J.
>>
>>
>>
>> On Thu, Jan 17, 2019 at 12:12 PM Jarek Potiuk <[email protected]>
>> wrote:
>>
>>> I've updated the calculations after removing some artifacts and
>>> rebulding the images from scratch. Here are the updated conclusions:
>>>
>>>
>>>    - The multi-layered image is only slightly bigger than the
>>>    mono-layered one (around *2% more *in total ) - download time is
>>>    also slightly longer by 1 s  (33.7 vs 32.7s) which is *3% longer.*
>>>    - Downloading the image regularly by the users is way better in case
>>>    of multi-layered image - for simulated user, downloading airflow image
>>>    twice a week it is:  *4950 MB*  (multi-layered) vs. *13546 MB*
>>>    (mono-layered) downloads over the course of 8 weeks. Yielding *64%
>>>    less data* to download.
>>>    - Multi-layered image seems to be much better for users regularly
>>>    downloading the image.
>>>
>>>
>>> On Wed, Jan 16, 2019 at 10:59 PM Jarek Potiuk <[email protected]>
>>> wrote:
>>>
>>>> Hello Everyone,
>>>>
>>>> Following the discussion we had on Mono-layered vs. Multi-layered
>>>> official image for Airflow here
>>>> https://github.com/apache/airflow/pull/4483, I prepared a
>>>> proof-of-concept PR of multi-layered image (based on the mono-layered one)
>>>> and I performed calculations and reached some conclusions in this proposal
>>>> (I wanted to have some hard numbers to back the statement that
>>>> multi-layered Docker file is better) :
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-10+Multi-layered+official+Airflow+image
>>>>
>>>> The conclusions I reached:
>>>>
>>>>    - The multi-layered image is even slightly smaller than the
>>>>    mono-layered one - so multi-layered image is even better when you 
>>>> download
>>>>    it once
>>>>    - Downloading the image regularly by the users is way better in
>>>>    case of multi-layered image - for simulated user, downloading airflow 
>>>> image
>>>>    twice a week it is:  5.7 GB  (multi-layered) vs. 16.15 GB (mono-layered)
>>>>    downloads over the course of 8 weeks.\
>>>>    - Multi-layered image is better choice.
>>>>
>>>>
>>>> I based those calculations on the PR I prepared:
>>>> https://github.com/apache/airflow/pull/4543 where I implemented rather
>>>> nice multi-layered Dockerfile that can be easily maintained.
>>>>
>>>> It's  based on my experience with Airflow Breeze
>>>> <https://github.com/PolideaInternal/airflow-breeze> - the GCP
>>>> Development environment we used to develop 30+ GCP based operators 
>>>> recently.
>>>>
>>>> I hope we can reach the conclusion as the community that multi-layered
>>>> is better and that we can go in this direction :). I am happy to iterate on
>>>> my PR to make it even better.
>>>>
>>>> J.
>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> E: [email protected]
>>>>
>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> E: [email protected]
>>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> E: [email protected]
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> E: [email protected]
>

Reply via email to