nchammas commented on code in PR #43953:
URL: https://github.com/apache/spark/pull/43953#discussion_r1426263880


##########
dev/infra/Dockerfile:
##########
@@ -139,3 +139,60 @@ RUN python3.12 -m pip install 'grpcio==1.59.3' 
'grpcio-status==1.59.3' 'protobuf
 # TODO(SPARK-46078) Use official one instead of nightly build when it's ready
 RUN python3.12 -m pip install --pre torch --index-url 
https://download.pytorch.org/whl/nightly/cpu
 RUN python3.12 -m pip install torcheval
+
+
+# Refer to 
https://github.com/ContinuumIO/docker-images/blob/main/miniconda3/debian/Dockerfile
+RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 
-O miniconda.sh -q && \
+    bash miniconda.sh -b -p /opt/miniconda3 && \
+    rm miniconda.sh && \
+    ln -s /opt/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
+    ln -s /opt/miniconda3/bin/conda /usr/local/bin/conda && \
+    find /opt/miniconda3/ -follow -type f -name '*.a' -delete && \
+    find /opt/miniconda3/ -follow -type f -name '*.js.map' -delete && \
+    conda clean -afy
+
+# Additional Python deps for linter and documentation, delete this section if 
another Python version is used
+# Since there maybe conflicts between envs, here uses conda to manage it.
+# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes.
+#   See also https://github.com/sphinx-doc/sphinx/issues/7551.
+# Jinja2 3.0.0+ causes error when building with Sphinx.
+#   See also https://issues.apache.org/jira/browse/SPARK-35375.
+RUN conda create -n doc python=3.9
+
+RUN conda run -n doc pip install \

Review Comment:
   Why are we listing out individual dependencies here vs. listing them in a 
requirements file (or equivalent) that we can use across Docker, GitHub 
Actions, and local build scripts? Don't we want our build and test dependencies 
to be consistent?
   
   I made a past attempt at this over in #27928. It failed because, in addition 
to building a shared set of build and test dependencies, it also pinned 
transitive build and test dependencies, which the reviewers weren't keen on. 
But we can separate the two ideas from each other.
   
   IMO there should be a single requirements file for build and test 
dependencies (whether or not it pins transitive dependencies is a separate 
issue), and that file should be used everywhere. What do you think?
   
   I also don't follow why we need to pull in conda. What is it getting us over 
vanilla pip?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to