[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...

2018-11-26 Thread shaneknapp
Github user shaneknapp commented on a diff in the pull request:

https://github.com/apache/spark/pull/23117#discussion_r236488706
  
--- Diff: dev/run-tests.py ---
@@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism):
 run_cmd(command)
 
 
+def run_python_tests_with_coverage(test_modules, parallelism):
+set_title_and_block("Running PySpark tests with coverage report", 
"BLOCK_PYSPARK_UNIT_TESTS")
+
+command = [os.path.join(SPARK_HOME, "python", 
"run-tests-with-coverage")]
+if test_modules != [modules.root]:
+command.append("--modules=%s" % ','.join(m.name for m in 
test_modules))
+command.append("--parallelism=%i" % parallelism)
+run_cmd(command)
+post_python_tests_results()
+
+
+def post_python_tests_results():
+if "SPARK_TEST_KEY" not in os.environ:
+print("[error] 'SPARK_TEST_KEY' environment variable was not set. 
Unable to post"
+  "PySpark coverage results.")
+sys.exit(1)
--- End diff --

sure, i can do that tomorrow (currently heading out for the day).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...

2018-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23117#discussion_r236486298
  
--- Diff: dev/run-tests.py ---
@@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism):
 run_cmd(command)
 
 
+def run_python_tests_with_coverage(test_modules, parallelism):
+set_title_and_block("Running PySpark tests with coverage report", 
"BLOCK_PYSPARK_UNIT_TESTS")
+
+command = [os.path.join(SPARK_HOME, "python", 
"run-tests-with-coverage")]
+if test_modules != [modules.root]:
+command.append("--modules=%s" % ','.join(m.name for m in 
test_modules))
+command.append("--parallelism=%i" % parallelism)
+run_cmd(command)
+post_python_tests_results()
+
+
+def post_python_tests_results():
+if "SPARK_TEST_KEY" not in os.environ:
+print("[error] 'SPARK_TEST_KEY' environment variable was not set. 
Unable to post"
+  "PySpark coverage results.")
+sys.exit(1)
--- End diff --

@shaneknapp can you add another environment variable that indicates PR 
builder and spark-master-test-sbt-hadoop-2.7 where we're going to run Python 
coverage? I can check it and explicitly enable it only in that condition.

True, if the condition below (which I checked before at #17669):

```python
os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "") == 
"hadoop2.7"
and os.environ.get("SPARK_BRANCH", "") == "master"
and os.environ.get("AMPLAB_JENKINS", "") == "true"
and os.environ.get("AMPLAB_JENKINS_BUILD_TOOL", "") == "sbt")
```

is `True` in Jenkins build or other users environment, it might cause some 
problems (even though looks quite unlikely).

For similar instance, if `AMPLAB_JENKINS` is set in users environment who 
run the tests locally, it wouldn't work anyway tho.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...

2018-11-26 Thread shaneknapp
Github user shaneknapp commented on a diff in the pull request:

https://github.com/apache/spark/pull/23117#discussion_r236440252
  
--- Diff: dev/run-tests.py ---
@@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism):
 run_cmd(command)
 
 
+def run_python_tests_with_coverage(test_modules, parallelism):
+set_title_and_block("Running PySpark tests with coverage report", 
"BLOCK_PYSPARK_UNIT_TESTS")
+
+command = [os.path.join(SPARK_HOME, "python", 
"run-tests-with-coverage")]
+if test_modules != [modules.root]:
+command.append("--modules=%s" % ','.join(m.name for m in 
test_modules))
+command.append("--parallelism=%i" % parallelism)
+run_cmd(command)
+post_python_tests_results()
+
+
+def post_python_tests_results():
+if "SPARK_TEST_KEY" not in os.environ:
+print("[error] 'SPARK_TEST_KEY' environment variable was not set. 
Unable to post"
+  "PySpark coverage results.")
+sys.exit(1)
--- End diff --

actually, i do agree w/you @squito ...  we need to make sure that the test 
running code works both in and out of our jenkins environment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...

2018-11-26 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/23117#discussion_r236431048
  
--- Diff: dev/run-tests.py ---
@@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism):
 run_cmd(command)
 
 
+def run_python_tests_with_coverage(test_modules, parallelism):
+set_title_and_block("Running PySpark tests with coverage report", 
"BLOCK_PYSPARK_UNIT_TESTS")
+
+command = [os.path.join(SPARK_HOME, "python", 
"run-tests-with-coverage")]
+if test_modules != [modules.root]:
+command.append("--modules=%s" % ','.join(m.name for m in 
test_modules))
+command.append("--parallelism=%i" % parallelism)
+run_cmd(command)
--- End diff --

this is mostly copied from ~L430, just "run-tests" -> 
"run-tests-with-coverage", could you refactor?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...

2018-11-26 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/23117#discussion_r236431733
  
--- Diff: dev/run-tests.py ---
@@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism):
 run_cmd(command)
 
 
+def run_python_tests_with_coverage(test_modules, parallelism):
+set_title_and_block("Running PySpark tests with coverage report", 
"BLOCK_PYSPARK_UNIT_TESTS")
+
+command = [os.path.join(SPARK_HOME, "python", 
"run-tests-with-coverage")]
+if test_modules != [modules.root]:
+command.append("--modules=%s" % ','.join(m.name for m in 
test_modules))
+command.append("--parallelism=%i" % parallelism)
+run_cmd(command)
+post_python_tests_results()
+
+
+def post_python_tests_results():
+if "SPARK_TEST_KEY" not in os.environ:
+print("[error] 'SPARK_TEST_KEY' environment variable was not set. 
Unable to post"
+  "PySpark coverage results.")
+sys.exit(1)
--- End diff --

hmm, this will be a headache for us in our internal builds, as we also run 
these tests, and also set AMPLAB_JENKINS as its sort of used as a catch-all for 
making builds quiet etc., but we won't have this key obviously.

you dont' need to cater to our internal builds, of course, but I'm 
wondering if this will cause a headache for more users that want to run tests 
themselves but won't have the key?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...

2018-11-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23117#discussion_r235660674
  
--- Diff: dev/run-tests.py ---
@@ -594,7 +651,18 @@ def main():
 
 modules_with_python_tests = [m for m in test_modules if 
m.python_test_goals]
 if modules_with_python_tests:
-run_python_tests(modules_with_python_tests, opts.parallelism)
+# We only run PySpark tests with coverage report in one specific 
job with
+# Spark master with SBT in Jenkins.
+is_sbt_master_job = (
+os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "") == 
"hadoop2.7"
+and os.environ.get("SPARK_BRANCH", "") == "master"
+and os.environ.get("AMPLAB_JENKINS", "") == "true"
+and os.environ.get("AMPLAB_JENKINS_BUILD_TOOL", "") == "sbt")
+is_sbt_master_job = True  # Will remove this right before getting 
merged.
--- End diff --

I should remove this before getting this in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...

2018-11-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23117#discussion_r235644712
  
--- Diff: dev/run-tests.py ---
@@ -594,7 +651,18 @@ def main():
 
 modules_with_python_tests = [m for m in test_modules if 
m.python_test_goals]
 if modules_with_python_tests:
-run_python_tests(modules_with_python_tests, opts.parallelism)
+# We only run PySpark tests with coverage report in one specific 
job with
+# Spark master with SBT in Jenkins.
+is_sbt_master_job = (
+os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "") == 
"hadoop2.7"
--- End diff --

This environment variables were checked before at 
https://github.com/apache/spark/pull/17669


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...

2018-11-22 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/23117

[WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python 
via Jenkins

## What changes were proposed in this pull request?


### Background

For the current status, the test script that generates coverage information 
was merged
into Spark, https://github.com/apache/spark/pull/20204

So, we can generate the coverage report and site by, for example:

```
run-tests-with-coverage --python-executables=python3 --modules=pyspark-sql
```

like `run-tests` script in `./python`.


### Proposed change

The next step is to host this coverage report via `github.io` automatically
by Jenkins (see https://spark-test.github.io/pyspark-coverage-site/).

This uses my testing account for Spark, @spark-test, which is shared to 
Felix and Shivaram a long time ago for testing purpose including AppVeyor.

To cut this short, this PR targets to run the coverage in 

[spark-master-test-sbt-hadoop-2.7](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/)

In the specific job, it will clone the page, and rebase the up-to-date 
PySpark test coverage from the latest commit. For instance as below:

```bash
# Clone PySpark coverage site.
git clone https://github.com/spark-test/pyspark-coverage-site.git

# Copy generated coverage HTML.
cp -r .../python/test_coverage/htmlcov/* pyspark-coverage-site/

# Check out to a temporary branch.
git checkout --orphan latest_branch

# Add all the files.
git add -A

# Commit current test coverage results.
git commit -am "Coverage report at latest commit in Apache Spark"

# Delete the old branch.
git branch -D gh-pages

# Rename the temporary branch to master.
git branch -m gh-pages

# Finally, force update to our repository.
git push -f origin gh-pages
```

So, it is a one single up-to-date coverage can be shown in the `github-io` 
page. The commands above were manually tested.

### TODO:

- [ ] Write a draft
- [ ] Set hidden `SPARK_TEST_KEY` for @spark-test's password in Jenkins via 
Jenkins's feature.
  This should be set both at `SparkPullRequestBuilder` so that we (or I) 
can test and `spark-master-test-sbt-hadoop-2.7`
- [ ] Make PR builder's test passed
- [ ] Enable it this build only at spark-master-test-sbt-hadoop-2.7 right 
before getting this in.

## How was this patch tested?

It will be tested via Jenkins.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-7721

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23117


commit d88d5aa73db636f8c73ace9f83f339781ea50531
Author: hyukjinkwon 
Date:   2018-11-22T08:08:20Z

Run and generate test coverage report from Python via Jenkins




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org