potiuk commented on a change in pull request #4936: [AIRFLOW-4115] 
Multi-staging Aiflow Docker image [Step 1/3]
URL: https://github.com/apache/airflow/pull/4936#discussion_r268880220
 
 

 ##########
 File path: .dockerignore
 ##########
 @@ -0,0 +1,107 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# NOTE! This docker ignore uses recommended technique
+# Where everything is excluded by default and you deliberately
+# Add only those directories/files you need. This is very useful
+# To make sure that Docker context is always the same on any machine
+# So that generated files are not accidentally added to the context
+# This allows Docker's `COPY .` to behave in predictable way
+
+# Ignore everything
+**
 
 Review comment:
   Correct with the duplicates. I reviewed and removed it.
   
   There is little worry that we will forget something - the tests will catch 
it (and it should rarely be needed - adding a new folder or file at the top 
level should be really rare event)
   
   And yes - you need those extra specs because they will exclude what should 
not be part of those included dirs above. The pattern I follow is (and I saw it 
suggested by several people):
   
   1) ** -> Exclude everything
   2) !/airflow -> include only those top-level folders/top-level files that 
you need
   3) **/__pycache__/ -> from those folders that you included in 2) exclude all 
the generated content that follows common patterns for generated stuff (like 
*.pyc file or /__pycache__/ folders)
   
   I feel very strong about it - having to debug the context changes (and 
figuring out what the hell changed to trigger copying of the files again) for 
many hours. Example: only after a while I realised that there is a generated 
folder in docs which is _build next to _template (which is not generated, 
thought the name kind of suggest it). And just generating the docs locally 
triggered rebuild of Dockerfile (that's why I did not include the whole docs 
folder :) ). Otherwise I would have to specifically exclude that particular 
folder (if I knew that I have to do it). I really think adding stuff 
deliberately to Docker context makes much more sense. 
   
   Alternative is to move ALL generated content to specific directory, and 
exclude that directory - but this is not always possible (node_modules) and 
from my experience people will not "stick" to it - they will start generating 
random files in different folders. Of course we are not protected if someone 
starts generating "xyz" file/folder inside "airflow" folder. But it's rather 
unlikely someone will do something that crazy.  I think generating something in 
sources is rather bad practice and you do it only if you cannot do it otherwise 
(that's one of the parts I don't like about python - .pyc files are generated 
next to actual source files. It reminds me of times when I used RCS versioning 
system and we had source.c,v next to every file.... BRRR)
   
   But to be honest - It was much bigger problem and I eventually introduced 
checksums for important files (setup/docker) to check if we need to run docker 
build at all.
   
   But still I think it makes it much more resilient for future changes that 
might produce uncontrollable and unpredictable context "growth". 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to