nchammas commented on code in PR #43953: URL: https://github.com/apache/spark/pull/43953#discussion_r1426263880
########## dev/infra/Dockerfile: ########## @@ -139,3 +139,60 @@ RUN python3.12 -m pip install 'grpcio==1.59.3' 'grpcio-status==1.59.3' 'protobuf # TODO(SPARK-46078) Use official one instead of nightly build when it's ready RUN python3.12 -m pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu RUN python3.12 -m pip install torcheval + + +# Refer to https://github.com/ContinuumIO/docker-images/blob/main/miniconda3/debian/Dockerfile +RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh -q && \ + bash miniconda.sh -b -p /opt/miniconda3 && \ + rm miniconda.sh && \ + ln -s /opt/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \ + ln -s /opt/miniconda3/bin/conda /usr/local/bin/conda && \ + find /opt/miniconda3/ -follow -type f -name '*.a' -delete && \ + find /opt/miniconda3/ -follow -type f -name '*.js.map' -delete && \ + conda clean -afy + +# Additional Python deps for linter and documentation, delete this section if another Python version is used +# Since there maybe conflicts between envs, here uses conda to manage it. +# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes. +# See also https://github.com/sphinx-doc/sphinx/issues/7551. +# Jinja2 3.0.0+ causes error when building with Sphinx. +# See also https://issues.apache.org/jira/browse/SPARK-35375. +RUN conda create -n doc python=3.9 + +RUN conda run -n doc pip install \ Review Comment: Why are we listing out individual dependencies here vs. listing them in a requirements file (or equivalent) that we can use across Docker, GitHub Actions, and local build scripts? Don't we want our build and test dependencies to be consistent? I made a past attempt at this over in #27928. It failed because, in addition to building a shared set of build and test dependencies, it also pinned transitive build and test dependencies, which the reviewers weren't keen on. But we can separate the two ideas from each other. IMO there should be a single requirements file for build and test dependencies (whether or not it pins transitive dependencies is a separate issue), and that file should be used everywhere. What do you think? I also don't follow why we need to pull in conda. What is it getting us over vanilla pip? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org