potiuk commented on issue #19970:
URL: https://github.com/apache/airflow/issues/19970#issuecomment-992774799


   Very good questions :).
   
   I think  for now start small (2.) and just get the images built with all the 
parameters.  The reason why the logic is so complex because it is heavily 
optimized for rebuiild time and it handles some cases on linuux where 
permissions of the files have to be updated when they are generated inside the 
container etc. etc. This is one of the most complex bash code I ever wrote, so 
I certainly do not expect we should get all of that working immediately.
   
   Also I think I made one huge mistake when I developed it - I tried to use 
the same code to build CI images and PROD imges - but in case of Bash such 
"common" code becomes extremely unreadable and complex when you try to keep it 
serve various purposes. 
   
   So  I think what you should really focus is to just implement  
"build-ci-image" command (and I think we should move away from "build-image 
--production", but we should have separate `build-ci-image` and 
`build-prod-image` commands. There is enough difference between them to make 
them separate (and maybe reuse some python code which wil bey much easier than 
bash code reuse.
   
   For now I tihnk what we really need:
   
   * get some structure in Python that should keep all the necessary parameters 
to build the image (TypedDict ?) 
   * get a function that will return the structure based on (initially) command 
line parameters/flag (go through the list of parameters and implement those 
that will be useful to get the full image build) - if in doubt whether a 
parameter should be used, ask
   * eventually this function should also take into account "last used" params 
for some of the parameters (some of those parameters are stored in .build - I 
think .PYTHON only is used for build. But let's not worry about that as well. 
This will be much more useful later when we get to the "shell" command.
   * do not worry about 'rebuild if needeed", computing md5, pulling remote 
etc. There is a separate issue to implement those, similarly fixing ownership 
etc.  will be done later
   * skip "cache" parameters for now. we should use "local" cache for now only 
- this will speed up testing and iterations.
   
   Your goal should really be: let's be able to build CI image wiht 
"build-image --python NN --.... " command with a number of variants possible by 
specifiying the right parameters. Eventually it is all about converting the 
"simple" parameters into the "docker command" which should be run.
   
   Don't try to replicate the "structure" of code from Bash. The Python 
structure should be very different and much more pythonic, also a lot of the 
reuse will be done differently than it was done in Bash (partly due to Bash 
limitations, partly due to historical evolution of Breeze, partly due to 
mistakes I made when I designed it).
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to