yuvipanda commented on issue #22349:
URL: https://github.com/apache/beam/issues/22349#issuecomment-1190746544

   Thanks a lot for working through this, @alxmrs!
   
   > I actually don't really know what problem Yuvi / the forge container is 
trying to solve in the first place. 
   
   This is a great question! We (the pangeo-project) maintain a set of docker 
images pre-built with specific pinned versions of common dependencies in the 
earth sciences ecosystem - 
https://github.com/pangeo-data/pangeo-docker-images/. We provided dated tags 
that people can reference and use wherever they need to run code - in 
JupyterHubs (for interactive Jupyter use), in dask (for scale-out workflows), 
etc. The goal of the `forge/` image is to provide a version that is usable in 
Apache beam contexts. These are fairly heavy images - the conda based 
environment build step takes at least 10 minutes, and often longer, to run - 
and so we can't really do these at *runtime*. We also want to make sure the 
packages are tested to work together, as they often have complex C (or even 
fortran!) based dependencies. There's also a reproducibility angle here, as 
specifying the docker image tag a workflow is using provides a better chance of 
longer term reproducibility than just a list of packages t
 o install.
   
   The goal is for end users to be able to pick a tag and know that it works 
with the rest of the geosciences stack curated by pangeo. I hope that helps 
clarify the goal of the forge/ image.
   
   I'm not entirely sure what the original problem with copying the go binary 
was, as long as we weren't copying the python packages. Possibly something to 
do with the inherited entrypoint? I'm doing some funky stuff in 
https://github.com/pangeo-data/pangeo-docker-images/pull/355/files#diff-a77643b43a7be453fa8556937bf32b27907e152a10d4c693f3e7670c66a44378
 to have the entrypoint work for both beam as well as for jupyter.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to