ctubbsii commented on code in PR #384:
URL: https://github.com/apache/accumulo-website/pull/384#discussion_r1177174353


##########
Dockerfile:
##########
@@ -0,0 +1,40 @@
+# This Dockerfile builds an ruby environment for jekyll that empowers
+# making updates to the accumulo website without requiring the dev
+# to maintain a local ruby development environment.
+
+FROM ruby:2.7.8-slim-bullseye as base
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    git \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /site
+
+
+# Copy over the Gemfiles so that all build dependencies are installed
+# during build vs at runtime.
+
+COPY Gemfile /site/Gemfile
+COPY Gemfile.lock /site/Gemfile.lock

Review Comment:
   > 2. $PWD/_site (location of rendered changes on the host)
   
   Gotcha. Rendered in the container process/memory space, but written to disk 
on the host because that is mounted.
   
   > I completely agree that the `/site` directory name is a bit overused. So 
I'm happy to take suggestions on a better name for the container side of things.
   
   Maybe "/mnt/repo" or "/mnt/staging" or "/mnt/build" or "/mnt/workdir"? I 
like putting things in `/mnt` because it makes things pretty clear that it's 
mounted, and from the perspective from inside the container, this is a pretty 
standard location for external filesystems to be mounted at.
   
   > So everything that happens in the Dockerfile is part of the container 
build stage. This is completely separate from actually running the container.
   
   My question isn't really about WHEN the RUN commands execute (I get that 
it's during the build stage), but WHERE (because it doesn't make sense that 
they would run locally on the host). My understanding is that the build stage 
involves running a container from the base image, executing RUN commands to 
modify the image, and when that build container stops, the resulting image is 
the modified image that we use to launch our own containers.
   
   Essentially:
   
   ```
   docker build -> create temporary build container from base image, start 
build container, exec RUN commands inside build container, stop build 
container, save modified image, delete build container
   docker create -> create container from image
   docker start -> run container with specified command, or stored CMD if 
command is unspecified
   docker run -> shorthand for docker create + docker start
   ```
   
   > This ensures that no site files are embedded in a container and that the 
source repository remains the only location for changes to occur.
   
   Some rendering might use /tmp inside the container. I'm not sure if any of 
that is preserved inside the container, but I don't think we care if it is or 
isn't.
   
   > Now, we want to mount the rendered content into the validator container. 
However, the webdev-validator container needs to have a slightly modified 
Gemfile.
   > 
   > To accomplish this, the run command only mounts $PWD/_site to /site/_site. 
This sets up the following structure
   
   My understanding here is that the build of the validator container does 
*not* have the repo mounted. So, the Gemfiles it is modifying are the ones 
copied to the base image when that was built.
   
   The confusion here is that when we run a container from the base image, 
we're mounting over the same directory as we copied these files. Are these just 
masked/hidden by the mount, then? Usually it's a good idea to mount to an empty 
directory. But in this case, it seems we're not doing that in the base image.
   
   If we're mounting over the copied Gemfiles, then there's no reason to create 
a second validator image... we can just modify the Gemfile during the base 
build, install *all* the gems to the image during the base image build. When we 
mount our website repo to render the Jekyll, we hide the modified Gemfiles. 
When we mount our rendered `_site` directory to do the validation, we re-expose 
the modified Gemfiles. We don't need a second image at all... we can do 
everything we want with the single image.
   
   Actually, we don't even need to copy over the Gemfiles during the build 
stage... those aren't needed for the htmlproofer. htmlproofer has need of its 
own Gemfiles. We're only copying over the ones from the repo and modifying 
them... but we don't need to do that... we can just have a separate set for the 
htmlproofer... and we don't even need the lock one at all for that.
   
   I think we can simplify things dramatically by just using one container.
   
   I do have one more question about WORKDIR... it seems that's overloaded to 
specify the current working directory inside the build container as well as the 
runtime container. But, I don't see why that necessarily needs to be the case.
   
   Also, one thing I was thinking was that it can be really hard to verify 
links using the rendered HTML, because you don't have an absolute directory to 
resolve absolute paths. The generated site is intended to be served at the root 
of a webserver. I don't know if htmlproofer supports this, but it'd probably be 
better to run the validation against `http://localhost:4000/` rather than 
`file:///site/_site/`.
   
   If we want to get this in quickly, we could get the Dockerfile in without 
the validator stuff right away. I think that because the validator stuff adds 
so much complexity (extra steps, extra documentation, maybe extra containers) 
that still needs polishing, I want to go back to my much earlier opinion that 
the validation stuff should be added in a separate PR. At this point, I think 
it's likely to hold up this PR further.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to