[GitHub] [accumulo-docker] keith-turner opened a new pull request, #20: Make rebuilding the docker image for a new Accumulo snapshot faster.

GitBox Fri, 16 Sep 2022 08:25:29 -0700


keith-turner opened a new pull request, #20:
URL: https://github.com/apache/accumulo-docker/pull/20


   This PR is intentionally incomplete as I am seeking it improve a problem I 
see but I am not sure this is the best approach.
   
   In the past when testing compactor and scan servers running Kubernetes I 
would go through the following process.
   
    1. clone accumulo docker 
    2. manually download hadoop and zookeeper
    3. build a snapshot version of accumulo
    4. build an accumulo docker image
    5. push the docker image to a container repository that the Kubernetes 
cluster can pull from
    6. restart the accumulo processes running in Kubernetes
    7. run some experiments and make some changes to accumulo and then go to 
step 3
   
   Step 4 above takes multiple minutes and creates a 2GB images.  Because the 
image is so large it makes steps 5 and 6 take a while as the image is uploaded 
and then downloaded from the repo.  This PR works around these problems by 
doing the following.
   
    * Move the code to download needed deps outside of the docker build file.  
This saves me time from manually downloading in steo 2 above.
    * Split the docker build file into two build files.  The first one builds a 
base image with java,hadoop,zookeeper.  The seond extends the first and only 
has to include Accumulo.
   
   With the above changes I can have the following workflow.
   
    1. clone accumulo docker 
    2. run download script to get hadoop and zookeeper files
    3. build the accumulo-base docker image that includes java, hadoop, and, zk
    4. build a snapshot version of accumulo
    5. build the accumulo docker image that extends accumulo-base and includes 
accumulo
    6. push the docker image to a container repository that the Kubernetes 
cluster can pull from
    7. restart the accumulo processes running in Kubernetes
    8. run some experiments and make some changes to accumulo and then go to 
step 4
   
   Step 5 above takes a few seconds (vs a few minutes) and produces a new image 
where the layers on top of accumulo-base are only ~30MB (can see this with 
docker history command).  The first times step 6 and 7 run, the large 
accumulo-base image will have to be uploaded and downloaded.  However on 
subsequent runs of step 6 and 7 only ~30MB needs to be uploaded and downloaded, 
making those steps much much faster.
   
   This is a huge improvement for what I am trying to do.  I did just enough 
work to get this functioning.  Before updating the readme, improving the docker 
file, and download script I would like to see if anyone has feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo-docker] keith-turner opened a new pull request, #20: Make rebuilding the docker image for a new Accumulo snapshot faster.

Reply via email to