leezu commented on a change in pull request #18574:
URL: https://github.com/apache/incubator-mxnet/pull/18574#discussion_r461244621



##########
File path: ci/docker/docker-compose.yml
##########
@@ -108,6 +108,16 @@ services:
         BASE_IMAGE: nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
       cache_from:
         - ${DOCKER_CACHE_REGISTRY}/build.ubuntu_gpu_cu101:latest
+  ubuntu_gpu_cu102:

Review comment:
       Let's delete the ubuntu_gpu_cu101 then?

##########
File path: ci/docker/docker-compose.yml
##########
@@ -108,6 +108,16 @@ services:
         BASE_IMAGE: nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
       cache_from:
         - ${DOCKER_CACHE_REGISTRY}/build.ubuntu_gpu_cu101:latest
+  ubuntu_gpu_cu102:

Review comment:
       Generally we can't test all possible cuda versions and thus our strategy 
so far is to test the oldest and newest supported version. For simplicity, I'd 
recommend to stick with the existing strategy and update all tests to 10.2.

##########
File path: ci/docker/Dockerfile.build.ubuntu
##########
@@ -110,6 +108,15 @@ COPY install/requirements /work/
 RUN python3 -m pip install cmake==3.16.6 && \
     python3 -m pip install -r /work/requirements
 
+RUN git clone --recursive -b 3.5.1.1 https://github.com/google/protobuf.git && 
\
+    cd protobuf && \
+    ./autogen.sh && \
+    ./configure && \
+    make -j$(nproc) install && \
+    cd .. && \
+    rm -rf protobuf && \
+    ldconfig

Review comment:
       Is the version provided by Ubuntu too old?

##########
File path: ci/docker/Dockerfile.build.ubuntu
##########
@@ -110,6 +108,15 @@ COPY install/requirements /work/
 RUN python3 -m pip install cmake==3.16.6 && \
     python3 -m pip install -r /work/requirements
 
+RUN git clone --recursive -b 3.5.1.1 https://github.com/google/protobuf.git && 
\
+    cd protobuf && \
+    ./autogen.sh && \
+    ./configure && \
+    make -j$(nproc) install && \
+    cd .. && \
+    rm -rf protobuf && \
+    ldconfig

Review comment:
       Do you know that you can reproduce the CI build via `python ci/build.py 
-R --platform ubuntu_gpu_cu102 /work/runtime_functions.sh 
build_ubuntu_gpu_tensorrt`?

##########
File path: ci/docker/Dockerfile.build.ubuntu
##########
@@ -110,6 +108,15 @@ COPY install/requirements /work/
 RUN python3 -m pip install cmake==3.16.6 && \
     python3 -m pip install -r /work/requirements
 
+RUN git clone --recursive -b 3.5.1.1 https://github.com/google/protobuf.git && 
\
+    cd protobuf && \
+    ./autogen.sh && \
+    ./configure && \
+    make -j$(nproc) install && \
+    cd .. && \
+    rm -rf protobuf && \
+    ldconfig

Review comment:
       Do you know that you can reproduce the CI build via `python ci/build.py 
--platform ubuntu_gpu_cu102 /work/runtime_functions.sh 
build_ubuntu_gpu_tensorrt`?

##########
File path: ci/docker/Dockerfile.build.ubuntu
##########
@@ -110,6 +108,15 @@ COPY install/requirements /work/
 RUN python3 -m pip install cmake==3.16.6 && \
     python3 -m pip install -r /work/requirements
 
+RUN git clone --recursive -b 3.5.1.1 https://github.com/google/protobuf.git && 
\
+    cd protobuf && \
+    ./autogen.sh && \
+    ./configure && \
+    make -j$(nproc) install && \
+    cd .. && \
+    rm -rf protobuf && \
+    ldconfig

Review comment:
       Do you know that you can reproduce the CI build via `python ci/build.py 
--platform ubuntu_gpu_cu102 /work/runtime_functions.sh 
build_ubuntu_gpu_tensorrt`? If the newer version is really needed, we should 
also update the documentation.

##########
File path: ci/docker/Dockerfile.build.ubuntu
##########
@@ -137,17 +137,27 @@ RUN cd /usr/local && \
 # 
https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
 ARG BASE_IMAGE
 RUN export SHORT_CUDA_VERSION=${CUDA_VERSION%.*} && \
+    wget -O nvidia-ml.deb 
https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
 && \
+    dpkg -i nvidia-ml.deb && \

Review comment:
       Currently the container already contains the machine learning repo in 
`/etc/apt/sources.list.d/nvidia-ml.list`. Why would the container team decide 
to break their users by removing the machine learning repo? Is there any other 
reason for re-installing the already existing repository?

##########
File path: ci/docker/Dockerfile.build.ubuntu
##########
@@ -137,17 +137,27 @@ RUN cd /usr/local && \
 # 
https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
 ARG BASE_IMAGE
 RUN export SHORT_CUDA_VERSION=${CUDA_VERSION%.*} && \
+    wget -O nvidia-ml.deb 
https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
 && \
+    dpkg -i nvidia-ml.deb && \

Review comment:
       I noticed the following issue with the container though I'm not sure 
about the impact https://gitlab.com/nvidia/container-images/cuda/-/issues/82




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to