[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...
Github user shaneknapp commented on a diff in the pull request: https://github.com/apache/spark/pull/23117#discussion_r236488706 --- Diff: dev/run-tests.py --- @@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism): run_cmd(command) +def run_python_tests_with_coverage(test_modules, parallelism): +set_title_and_block("Running PySpark tests with coverage report", "BLOCK_PYSPARK_UNIT_TESTS") + +command = [os.path.join(SPARK_HOME, "python", "run-tests-with-coverage")] +if test_modules != [modules.root]: +command.append("--modules=%s" % ','.join(m.name for m in test_modules)) +command.append("--parallelism=%i" % parallelism) +run_cmd(command) +post_python_tests_results() + + +def post_python_tests_results(): +if "SPARK_TEST_KEY" not in os.environ: +print("[error] 'SPARK_TEST_KEY' environment variable was not set. Unable to post" + "PySpark coverage results.") +sys.exit(1) --- End diff -- sure, i can do that tomorrow (currently heading out for the day). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23117#discussion_r236486298 --- Diff: dev/run-tests.py --- @@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism): run_cmd(command) +def run_python_tests_with_coverage(test_modules, parallelism): +set_title_and_block("Running PySpark tests with coverage report", "BLOCK_PYSPARK_UNIT_TESTS") + +command = [os.path.join(SPARK_HOME, "python", "run-tests-with-coverage")] +if test_modules != [modules.root]: +command.append("--modules=%s" % ','.join(m.name for m in test_modules)) +command.append("--parallelism=%i" % parallelism) +run_cmd(command) +post_python_tests_results() + + +def post_python_tests_results(): +if "SPARK_TEST_KEY" not in os.environ: +print("[error] 'SPARK_TEST_KEY' environment variable was not set. Unable to post" + "PySpark coverage results.") +sys.exit(1) --- End diff -- @shaneknapp can you add another environment variable that indicates PR builder and spark-master-test-sbt-hadoop-2.7 where we're going to run Python coverage? I can check it and explicitly enable it only in that condition. True, if the condition below (which I checked before at #17669): ```python os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "") == "hadoop2.7" and os.environ.get("SPARK_BRANCH", "") == "master" and os.environ.get("AMPLAB_JENKINS", "") == "true" and os.environ.get("AMPLAB_JENKINS_BUILD_TOOL", "") == "sbt") ``` is `True` in Jenkins build or other users environment, it might cause some problems (even though looks quite unlikely). For similar instance, if `AMPLAB_JENKINS` is set in users environment who run the tests locally, it wouldn't work anyway tho. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...
Github user shaneknapp commented on a diff in the pull request: https://github.com/apache/spark/pull/23117#discussion_r236440252 --- Diff: dev/run-tests.py --- @@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism): run_cmd(command) +def run_python_tests_with_coverage(test_modules, parallelism): +set_title_and_block("Running PySpark tests with coverage report", "BLOCK_PYSPARK_UNIT_TESTS") + +command = [os.path.join(SPARK_HOME, "python", "run-tests-with-coverage")] +if test_modules != [modules.root]: +command.append("--modules=%s" % ','.join(m.name for m in test_modules)) +command.append("--parallelism=%i" % parallelism) +run_cmd(command) +post_python_tests_results() + + +def post_python_tests_results(): +if "SPARK_TEST_KEY" not in os.environ: +print("[error] 'SPARK_TEST_KEY' environment variable was not set. Unable to post" + "PySpark coverage results.") +sys.exit(1) --- End diff -- actually, i do agree w/you @squito ... we need to make sure that the test running code works both in and out of our jenkins environment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/23117#discussion_r236431048 --- Diff: dev/run-tests.py --- @@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism): run_cmd(command) +def run_python_tests_with_coverage(test_modules, parallelism): +set_title_and_block("Running PySpark tests with coverage report", "BLOCK_PYSPARK_UNIT_TESTS") + +command = [os.path.join(SPARK_HOME, "python", "run-tests-with-coverage")] +if test_modules != [modules.root]: +command.append("--modules=%s" % ','.join(m.name for m in test_modules)) +command.append("--parallelism=%i" % parallelism) +run_cmd(command) --- End diff -- this is mostly copied from ~L430, just "run-tests" -> "run-tests-with-coverage", could you refactor? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/23117#discussion_r236431733 --- Diff: dev/run-tests.py --- @@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism): run_cmd(command) +def run_python_tests_with_coverage(test_modules, parallelism): +set_title_and_block("Running PySpark tests with coverage report", "BLOCK_PYSPARK_UNIT_TESTS") + +command = [os.path.join(SPARK_HOME, "python", "run-tests-with-coverage")] +if test_modules != [modules.root]: +command.append("--modules=%s" % ','.join(m.name for m in test_modules)) +command.append("--parallelism=%i" % parallelism) +run_cmd(command) +post_python_tests_results() + + +def post_python_tests_results(): +if "SPARK_TEST_KEY" not in os.environ: +print("[error] 'SPARK_TEST_KEY' environment variable was not set. Unable to post" + "PySpark coverage results.") +sys.exit(1) --- End diff -- hmm, this will be a headache for us in our internal builds, as we also run these tests, and also set AMPLAB_JENKINS as its sort of used as a catch-all for making builds quiet etc., but we won't have this key obviously. you dont' need to cater to our internal builds, of course, but I'm wondering if this will cause a headache for more users that want to run tests themselves but won't have the key? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23117#discussion_r235660674 --- Diff: dev/run-tests.py --- @@ -594,7 +651,18 @@ def main(): modules_with_python_tests = [m for m in test_modules if m.python_test_goals] if modules_with_python_tests: -run_python_tests(modules_with_python_tests, opts.parallelism) +# We only run PySpark tests with coverage report in one specific job with +# Spark master with SBT in Jenkins. +is_sbt_master_job = ( +os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "") == "hadoop2.7" +and os.environ.get("SPARK_BRANCH", "") == "master" +and os.environ.get("AMPLAB_JENKINS", "") == "true" +and os.environ.get("AMPLAB_JENKINS_BUILD_TOOL", "") == "sbt") +is_sbt_master_job = True # Will remove this right before getting merged. --- End diff -- I should remove this before getting this in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23117#discussion_r235644712 --- Diff: dev/run-tests.py --- @@ -594,7 +651,18 @@ def main(): modules_with_python_tests = [m for m in test_modules if m.python_test_goals] if modules_with_python_tests: -run_python_tests(modules_with_python_tests, opts.parallelism) +# We only run PySpark tests with coverage report in one specific job with +# Spark master with SBT in Jenkins. +is_sbt_master_job = ( +os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "") == "hadoop2.7" --- End diff -- This environment variables were checked before at https://github.com/apache/spark/pull/17669 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/23117 [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins ## What changes were proposed in this pull request? ### Background For the current status, the test script that generates coverage information was merged into Spark, https://github.com/apache/spark/pull/20204 So, we can generate the coverage report and site by, for example: ``` run-tests-with-coverage --python-executables=python3 --modules=pyspark-sql ``` like `run-tests` script in `./python`. ### Proposed change The next step is to host this coverage report via `github.io` automatically by Jenkins (see https://spark-test.github.io/pyspark-coverage-site/). This uses my testing account for Spark, @spark-test, which is shared to Felix and Shivaram a long time ago for testing purpose including AppVeyor. To cut this short, this PR targets to run the coverage in [spark-master-test-sbt-hadoop-2.7](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/) In the specific job, it will clone the page, and rebase the up-to-date PySpark test coverage from the latest commit. For instance as below: ```bash # Clone PySpark coverage site. git clone https://github.com/spark-test/pyspark-coverage-site.git # Copy generated coverage HTML. cp -r .../python/test_coverage/htmlcov/* pyspark-coverage-site/ # Check out to a temporary branch. git checkout --orphan latest_branch # Add all the files. git add -A # Commit current test coverage results. git commit -am "Coverage report at latest commit in Apache Spark" # Delete the old branch. git branch -D gh-pages # Rename the temporary branch to master. git branch -m gh-pages # Finally, force update to our repository. git push -f origin gh-pages ``` So, it is a one single up-to-date coverage can be shown in the `github-io` page. The commands above were manually tested. ### TODO: - [ ] Write a draft - [ ] Set hidden `SPARK_TEST_KEY` for @spark-test's password in Jenkins via Jenkins's feature. This should be set both at `SparkPullRequestBuilder` so that we (or I) can test and `spark-master-test-sbt-hadoop-2.7` - [ ] Make PR builder's test passed - [ ] Enable it this build only at spark-master-test-sbt-hadoop-2.7 right before getting this in. ## How was this patch tested? It will be tested via Jenkins. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-7721 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23117.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23117 commit d88d5aa73db636f8c73ace9f83f339781ea50531 Author: hyukjinkwon Date: 2018-11-22T08:08:20Z Run and generate test coverage report from Python via Jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org