[GitHub] [spark] HyukjinKwon commented on a change in pull request #29465: [DO-NOT-MERGE][SPARK-32249][2.4] Run Github Actions builds in other branches as well

GitBox Tue, 18 Aug 2020 08:37:28 -0700


HyukjinKwon commented on a change in pull request #29465:
URL: https://github.com/apache/spark/pull/29465#discussion_r472291187




##########
File path: .github/workflows/master.yml
##########
@@ -0,0 +1,233 @@
+name: Build and test
+
+on:
+  push:
+    branches:
+    - branch-2.4
+  pull_request:
+    branches:
+    - branch-2.4
+
+jobs:
+  # Build: build Spark and run the tests for specified modules.
+  build:
+    name: "Build modules: ${{ matrix.modules }} ${{ matrix.comment }} (JDK ${{ 
matrix.java }}, ${{ matrix.hadoop }})"
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        java:
+          - 1.8
+        hadoop:
+          - hadoop2.6
+        # TODO(SPARK-32246): We don't test 'streaming-kinesis-asl' for now.
+        # Kinesis tests depends on external Amazon kinesis service.
+        # Note that the modules below are from sparktestsupport/modules.py.
+        modules:
+          - >-
+            core, unsafe, kvstore, avro,
+            network-common, network-shuffle, repl, launcher,
+            examples, sketch, graphx
+          - >-
+            catalyst, hive-thriftserver
+          - >-
+            streaming, sql-kafka-0-10, streaming-kafka-0-10,
+            mllib-local, mllib,
+            yarn, mesos, kubernetes, hadoop-cloud, spark-ganglia-lgpl,
+            streaming-flume, streaming-flume-sink, streaming-kafka-0-8
+          - >-
+            pyspark-sql, pyspark-mllib
+          - >-
+            pyspark-core, pyspark-streaming, pyspark-ml
+          - >-
+            sparkr
+          - >-
+            sql
+        # Here, we split Hive and SQL tests into some of slow ones and the 
rest of them.
+        included-tags: [""]
+        excluded-tags: [""]
+        comment: [""]
+        include:
+          # Hive tests
+          - modules: hive
+            java: 1.8
+            hadoop: hadoop2.6
+            included-tags: org.apache.spark.tags.SlowHiveTest
+            comment: "- slow tests"
+          - modules: hive
+            java: 1.8
+            hadoop: hadoop2.6
+            excluded-tags: org.apache.spark.tags.SlowHiveTest
+            comment: "- other tests"
+    env:
+      MODULES_TO_TEST: ${{ matrix.modules }}
+      EXCLUDED_TAGS: ${{ matrix.excluded-tags }}
+      INCLUDED_TAGS: ${{ matrix.included-tags }}
+      HADOOP_PROFILE: ${{ matrix.hadoop }}
+      # GitHub Actions' default miniconda to use in pip packaging test.
+      CONDA_PREFIX: /usr/share/miniconda
+      GITHUB_PREV_SHA: ${{ github.event.before }}
+    steps:
+    - name: Checkout Spark repository
+      uses: actions/checkout@v2
+      # In order to fetch changed files
+      with:
+        fetch-depth: 0
+    # Cache local repositories. Note that GitHub Actions cache has a 2G limit.
+    - name: Cache Scala, SBT, Maven and Zinc
+      uses: actions/cache@v1
+      with:
+        path: build
+        key: build-${{ hashFiles('**/pom.xml') }}
+        restore-keys: |
+          build-
+    - name: Cache Maven local repository
+      uses: actions/cache@v2
+      with:
+        path: ~/.m2/repository
+        key: ${{ matrix.java }}-${{ matrix.hadoop }}-maven-${{ 
hashFiles('**/pom.xml') }}
+        restore-keys: |
+          ${{ matrix.java }}-${{ matrix.hadoop }}-maven-
+    - name: Cache Ivy local repository
+      uses: actions/cache@v2
+      with:
+        path: ~/.ivy2/cache
+        key: ${{ matrix.java }}-${{ matrix.hadoop }}-ivy-${{ 
hashFiles('**/pom.xml') }}-${{ hashFiles('**/plugins.sbt') }}
+        restore-keys: |
+          ${{ matrix.java }}-${{ matrix.hadoop }}-ivy-
+    - name: Install JDK ${{ matrix.java }}
+      uses: actions/setup-java@v1
+      with:
+        java-version: ${{ matrix.java }}
+    # PySpark
+    - name: Install PyPy3
+      # Note that order of Python installations here matters because default 
python is
+      # overridden.
+      uses: actions/setup-python@v2
+      if: contains(matrix.modules, 'pyspark')
+      with:
+        python-version: pypy3
+        architecture: x64
+    - name: Install Python 3.6
+      uses: actions/setup-python@v2
+      if: contains(matrix.modules, 'pyspark')
+      with:
+        python-version: 3.6
+        architecture: x64
+    - name: Install Python 2.7
+      uses: actions/setup-python@v2
+      # Yarn has a Python specific test too, for example, YarnClusterSuite.
+      if: contains(matrix.modules, 'yarn') || contains(matrix.modules, 
'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 
'sql-'))
+      with:
+        python-version: 2.7
+        architecture: x64
+    - name: Install Python packages (Python 3.6 and PyPy3)
+      if: contains(matrix.modules, 'pyspark')
+      # PyArrow is not supported in PyPy yet, see ARROW-2651.
+      # TODO(SPARK-32247): scipy installation with PyPy fails for an unknown 
reason.
+      run: |
+        python3.6 -m pip install numpy pyarrow pandas scipy xmlrunner
+        python3.6 -m pip list
+        # PyPy does not have xmlrunner
+        pypy3 -m pip install numpy pandas
+        pypy3 -m pip list
+    - name: Install Python packages (Python 2.7)
+      if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 
'sql') && !contains(matrix.modules, 'sql-'))
+      run: |
+        python2.7 -m pip install numpy pyarrow pandas scipy xmlrunner
+        python2.7 -m pip list
+    # SparkR
+    - name: Install R 4.0
+      if: contains(matrix.modules, 'sparkr')
+      run: |
+        sudo sh -c "echo 'deb https://cloud.r-project.org/bin/linux/ubuntu 
bionic-cran40/' >> /etc/apt/sources.list"
+        curl -sL 
"https://keyserver.ubuntu.com/pks/lookup?op=get&search=0xE298A3A825C0D65DFD57CBB651716619E084DAB9";
 | sudo apt-key add
+        sudo apt-get update
+        sudo apt-get install -y r-base r-base-dev libcurl4-openssl-dev
+    - name: Install R packages
+      if: contains(matrix.modules, 'sparkr')
+      run: |
+        # qpdf is required to reduce the size of PDFs to make CRAN check pass. 
See SPARK-32497.
+        sudo apt-get install -y libcurl4-openssl-dev qpdf
+        sudo Rscript -e "install.packages(c('knitr', 'rmarkdown', 'testthat', 
'devtools', 'e1071', 'survival', 'arrow', 'roxygen2'), 
repos='https://cloud.r-project.org/')"
+        # Show installed packages in R.
+        sudo Rscript -e 'pkg_list <- as.data.frame(installed.packages()[, 
c(1,3:4)]); pkg_list[is.na(pkg_list$Priority), 1:2, drop = FALSE]'
+    # Run the tests.
+    - name: Run tests
+      run: |
+        # Hive tests become flaky when running in parallel as it's too 
intensive.
+        if [[ "$MODULES_TO_TEST" == "hive" ]]; then export SERIAL_SBT_TESTS=1; 
fi
+        mkdir -p ~/.m2
+        ./dev/run-tests --parallelism 2 --modules "$MODULES_TO_TEST" 
--included-tags "$INCLUDED_TAGS" --excluded-tags "$EXCLUDED_TAGS"
+        rm -rf ~/.m2/repository/org/apache/spark
+    - name: Upload test results to report
+      if: always()
+      uses: actions/upload-artifact@v2
+      with:
+        name: test-results-${{ matrix.modules }}-${{ matrix.comment }}-${{ 
matrix.java }}-${{ matrix.hadoop }}
+        path: "**/target/test-reports/*.xml"
+
+  # Static analysis, and documentation build
+  lint:
+    name: Linters, licenses, dependencies and documentation generation
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout Spark repository
+      uses: actions/checkout@v2
+    - name: Cache Maven local repository
+      uses: actions/cache@v2
+      with:
+        path: ~/.m2/repository
+        key: docs-maven-repo-${{ hashFiles('**/pom.xml') }}
+        restore-keys: |
+          docs-maven-
+    - name: Install JDK 1.8
+      uses: actions/setup-java@v1
+      with:
+        java-version: 1.8
+    - name: Install Python 3.6

Review comment:
       branch-2.4 does not support Python 3.8 due to the old cloudpickle 
(SPARK-29536).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29465: [DO-NOT-MERGE][SPARK-32249][2.4] Run Github Actions builds in other branches as well

Reply via email to