gortiz commented on issue #8718:
URL: https://github.com/apache/pinot/issues/8718#issuecomment-1172448787
I've just made an analysis of the docker image.
It seems that the three bigger layers are:
- 344MBs of base image (jdk-slim), which is highly reusable. That means that
given two images, they will probably use the same base and therefore the
space/download time would be paid only once
- 617MBs of apt-update and apt-install, which is not reusable at all. We can
improve this by creating a specific base image and reusing it.
- 716MBs of apache-pinot, which are copied in a single layer. Of which:
- 100MBs are examples, which are very static (they almost never change)
- 454MBs are plugins, which are mostly shaded dependencies. They are
highly optimizable with docker layers, but we cannot use that because they are
shaded
- 150MBs are pinot itself and their dependencies. I guess most of it
would be the dependencies, which again could be layered, but they are shaded.
This means that each time we change a single character and create a new
docker image, we are storing and downloading in our pods (617 + 716)MBs of
data. I think that almost 1GB of that data is static information we could just
reuse if correctly using docker layers.
What @xiangfu0 suggested about having different images with more or less
plugins or that are able to download plugins at start time can be a solution,
but I think it would have the side effect of making quite more difficult to
understand to customers. If instead of doing that we just correctly use the
layers, the first time a user downloads an image will need to download 1.3GBs
of data, but if then he/she downloads a second version, it is very probable
that most of the layers would be the same, so he/she would only need to
download around 150MBs of data. It also applies to our own pods, which would
only need to download these 150MBs instead of 1.3GBs on most upgrades.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]