leezu commented on a change in pull request #18574: URL: https://github.com/apache/incubator-mxnet/pull/18574#discussion_r461244621
########## File path: ci/docker/docker-compose.yml ########## @@ -108,6 +108,16 @@ services: BASE_IMAGE: nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04 cache_from: - ${DOCKER_CACHE_REGISTRY}/build.ubuntu_gpu_cu101:latest + ubuntu_gpu_cu102: Review comment: Let's delete the ubuntu_gpu_cu101 then? ########## File path: ci/docker/docker-compose.yml ########## @@ -108,6 +108,16 @@ services: BASE_IMAGE: nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04 cache_from: - ${DOCKER_CACHE_REGISTRY}/build.ubuntu_gpu_cu101:latest + ubuntu_gpu_cu102: Review comment: Generally we can't test all possible cuda versions and thus our strategy so far is to test the oldest and newest supported version. For simplicity, I'd recommend to stick with the existing strategy and update all tests to 10.2. ########## File path: ci/docker/Dockerfile.build.ubuntu ########## @@ -110,6 +108,15 @@ COPY install/requirements /work/ RUN python3 -m pip install cmake==3.16.6 && \ python3 -m pip install -r /work/requirements +RUN git clone --recursive -b 3.5.1.1 https://github.com/google/protobuf.git && \ + cd protobuf && \ + ./autogen.sh && \ + ./configure && \ + make -j$(nproc) install && \ + cd .. && \ + rm -rf protobuf && \ + ldconfig Review comment: Is the version provided by Ubuntu too old? ########## File path: ci/docker/Dockerfile.build.ubuntu ########## @@ -110,6 +108,15 @@ COPY install/requirements /work/ RUN python3 -m pip install cmake==3.16.6 && \ python3 -m pip install -r /work/requirements +RUN git clone --recursive -b 3.5.1.1 https://github.com/google/protobuf.git && \ + cd protobuf && \ + ./autogen.sh && \ + ./configure && \ + make -j$(nproc) install && \ + cd .. && \ + rm -rf protobuf && \ + ldconfig Review comment: Do you know that you can reproduce the CI build via `python ci/build.py -R --platform ubuntu_gpu_cu102 /work/runtime_functions.sh build_ubuntu_gpu_tensorrt`? ########## File path: ci/docker/Dockerfile.build.ubuntu ########## @@ -110,6 +108,15 @@ COPY install/requirements /work/ RUN python3 -m pip install cmake==3.16.6 && \ python3 -m pip install -r /work/requirements +RUN git clone --recursive -b 3.5.1.1 https://github.com/google/protobuf.git && \ + cd protobuf && \ + ./autogen.sh && \ + ./configure && \ + make -j$(nproc) install && \ + cd .. && \ + rm -rf protobuf && \ + ldconfig Review comment: Do you know that you can reproduce the CI build via `python ci/build.py --platform ubuntu_gpu_cu102 /work/runtime_functions.sh build_ubuntu_gpu_tensorrt`? ########## File path: ci/docker/Dockerfile.build.ubuntu ########## @@ -110,6 +108,15 @@ COPY install/requirements /work/ RUN python3 -m pip install cmake==3.16.6 && \ python3 -m pip install -r /work/requirements +RUN git clone --recursive -b 3.5.1.1 https://github.com/google/protobuf.git && \ + cd protobuf && \ + ./autogen.sh && \ + ./configure && \ + make -j$(nproc) install && \ + cd .. && \ + rm -rf protobuf && \ + ldconfig Review comment: Do you know that you can reproduce the CI build via `python ci/build.py --platform ubuntu_gpu_cu102 /work/runtime_functions.sh build_ubuntu_gpu_tensorrt`? If the newer version is really needed, we should also update the documentation. ########## File path: ci/docker/Dockerfile.build.ubuntu ########## @@ -137,17 +137,27 @@ RUN cd /usr/local && \ # https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact ARG BASE_IMAGE RUN export SHORT_CUDA_VERSION=${CUDA_VERSION%.*} && \ + wget -O nvidia-ml.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb && \ + dpkg -i nvidia-ml.deb && \ Review comment: Currently the container already contains the machine learning repo in `/etc/apt/sources.list.d/nvidia-ml.list`. Why would the container team decide to break their users by removing the machine learning repo? Is there any other reason for re-installing the already existing repository? ########## File path: ci/docker/Dockerfile.build.ubuntu ########## @@ -137,17 +137,27 @@ RUN cd /usr/local && \ # https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact ARG BASE_IMAGE RUN export SHORT_CUDA_VERSION=${CUDA_VERSION%.*} && \ + wget -O nvidia-ml.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb && \ + dpkg -i nvidia-ml.deb && \ Review comment: I noticed the following issue with the container though I'm not sure about the impact https://gitlab.com/nvidia/container-images/cuda/-/issues/82 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org