ctubbsii commented on code in PR #384: URL: https://github.com/apache/accumulo-website/pull/384#discussion_r1177174353
########## Dockerfile: ########## @@ -0,0 +1,40 @@ +# This Dockerfile builds an ruby environment for jekyll that empowers +# making updates to the accumulo website without requiring the dev +# to maintain a local ruby development environment. + +FROM ruby:2.7.8-slim-bullseye as base + +RUN apt-get update && apt-get install -y --no-install-recommends \ + build-essential \ + git \ + curl \ + && rm -rf /var/lib/apt/lists/* + +WORKDIR /site + + +# Copy over the Gemfiles so that all build dependencies are installed +# during build vs at runtime. + +COPY Gemfile /site/Gemfile +COPY Gemfile.lock /site/Gemfile.lock Review Comment: > 2. $PWD/_site (location of rendered changes on the host) Gotcha. Rendered in the container process/memory space, but written to disk on the host because that is mounted. > I completely agree that the `/site` directory name is a bit overused. So I'm happy to take suggestions on a better name for the container side of things. Maybe "/mnt/repo" or "/mnt/staging" or "/mnt/build" or "/mnt/workdir"? I like putting things in `/mnt` because it makes things pretty clear that it's mounted, and from the perspective from inside the container, this is a pretty standard location for external filesystems to be mounted at. > So everything that happens in the Dockerfile is part of the container build stage. This is completely separate from actually running the container. My question isn't really about WHEN the RUN commands execute (I get that it's during the build stage), but WHERE (because it doesn't make sense that they would run locally on the host). My understanding is that the build stage involves running a container from the base image, executing RUN commands to modify the image, and when that build container stops, the resulting image is the modified image that we use to launch our own containers. Essentially: ``` docker build -> create temporary build container from base image, start build container, exec RUN commands inside build container, stop build container, save modified image, delete build container docker create -> create container from image docker start -> run container with specified command, or stored CMD if command is unspecified docker run -> shorthand for docker create + docker start ``` > This ensures that no site files are embedded in a container and that the source repository remains the only location for changes to occur. Some rendering might use /tmp inside the container. I'm not sure if any of that is preserved inside the container, but I don't think we care if it is or isn't. > Now, we want to mount the rendered content into the validator container. However, the webdev-validator container needs to have a slightly modified Gemfile. > > To accomplish this, the run command only mounts $PWD/_site to /site/_site. This sets up the following structure My understanding here is that the build of the validator container does *not* have the repo mounted. So, the Gemfiles it is modifying are the ones copied to the base image when that was built. The confusion here is that when we run a container from the base image, we're mounting over the same directory as we copied these files. Are these just masked/hidden by the mount, then? Usually it's a good idea to mount to an empty directory. But in this case, it seems we're not doing that in the base image. If we're mounting over the copied Gemfiles, then there's no reason to create a second validator image... we can just modify the Gemfile during the base build, install *all* the gems to the image during the base image build. When we mount our website repo to render the Jekyll, we hide the modified Gemfiles. When we mount our rendered `_site` directory to do the validation, we re-expose the modified Gemfiles. We don't need a second image at all... we can do everything we want with the single image. Actually, we don't even need to copy over the Gemfiles during the build stage... those aren't needed for the htmlproofer. htmlproofer has need of its own Gemfiles. We're only copying over the ones from the repo and modifying them... but we don't need to do that... we can just have a separate set for the htmlproofer... and we don't even need the lock one at all for that. I think we can simplify things dramatically by just using one container. I do have one more question about WORKDIR... it seems that's overloaded to specify the current working directory inside the build container as well as the runtime container. But, I don't see why that necessarily needs to be the case. Also, one thing I was thinking was that it can be really hard to verify links using the rendered HTML, because you don't have an absolute directory to resolve absolute paths. The generated site is intended to be served at the root of a webserver. I don't know if htmlproofer supports this, but it'd probably be better to run the validation against `http://localhost:4000/` rather than `file:///site/_site/`. If we want to get this in quickly, we could get the Dockerfile in without the validator stuff right away. I think that because the validator stuff adds so much complexity (extra steps, extra documentation, maybe extra containers) that still needs polishing, I want to go back to my much earlier opinion that the validation stuff should be added in a separate PR. At this point, I think it's likely to hold up this PR further. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
