Repository: spark
Updated Branches:
  refs/heads/master 7143e9d72 -> 7e3eb3cd2


[SPARK-26252][PYTHON] Add support to run specific unittests and/or doctests in 
python/run-tests script

## What changes were proposed in this pull request?

This PR proposes add a developer option, `--testnames`, to our testing script 
to allow run specific set of unittests and doctests.

**1. Run unittests in the class**

```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests'
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow 
ArrowTests']
Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests
Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests
Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (14s)
Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (14s) ... 22 tests 
were skipped
Tests passed in 14 seconds

Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy:
    test_createDataFrame_column_name_encoding 
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be 
installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input 
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be 
installed; however, it was not found.'
    test_createDataFrame_fallback_disabled 
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be 
installed; however, it was not found.'
    test_createDataFrame_fallback_enabled 
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped
...
```

**2. Run single unittest in the class.**

```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion'
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion']
Starting test(pypy): pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion
Starting test(python2.7): pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion
Finished test(pypy): pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion (0s) ... 1 tests were skipped
Finished test(python2.7): pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion (8s)
Tests passed in 8 seconds

Skipped tests in pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion 
with pypy:
    test_null_conversion (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 
'Pandas >= 0.19.2 must be installed; however, it was not found.'
```

**3. Run doctests in single PySpark module.**

```bash
./run-tests --testnames pyspark.sql.dataframe
```

```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.dataframe']
Starting test(pypy): pyspark.sql.dataframe
Starting test(python2.7): pyspark.sql.dataframe
Finished test(python2.7): pyspark.sql.dataframe (47s)
Finished test(pypy): pyspark.sql.dataframe (48s)
Tests passed in 48 seconds
```

Of course, you can mix them:

```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow 
ArrowTests,pyspark.sql.dataframe'
```

```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow 
ArrowTests', 'pyspark.sql.dataframe']
Starting test(pypy): pyspark.sql.dataframe
Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests
Starting test(python2.7): pyspark.sql.dataframe
Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests
Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (0s) ... 22 tests 
were skipped
Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (18s)
Finished test(python2.7): pyspark.sql.dataframe (50s)
Finished test(pypy): pyspark.sql.dataframe (52s)
Tests passed in 52 seconds

Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy:
    test_createDataFrame_column_name_encoding 
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be 
installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input 
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be 
installed; however, it was not found.'
    test_createDataFrame_fallback_disabled 
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be 
installed; however, it was not found.'
```

and also you can use all other options (except `--modules`, which will be 
ignored)

```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion' --python-executables=python
```

```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion']
Starting test(python): pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion
Finished test(python): pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion (12s)
Tests passed in 12 seconds
```

See help below:

```bash
 ./run-tests --help
```

```
Usage: run-tests [options]

Options:
...
  Developer Options:
    --testnames=TESTNAMES
                        A comma-separated list of specific modules, classes
                        and functions of doctest or unittest to test. For
                        example, 'pyspark.sql.foo' to run the module as
                        unittests or doctests, 'pyspark.sql.tests FooTests' to
                        run the specific class of unittests,
                        'pyspark.sql.tests FooTests.test_foo' to run the
                        specific unittest in the class. '--modules' option is
                        ignored if they are given.
```

I intentionally grouped it as a developer option to be more conservative.

## How was this patch tested?

Manually tested. Negative tests were also done.

```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow 
ArrowTests.test_null_conversion1' --python-executables=python
```

```
...
AttributeError: type object 'ArrowTests' has no attribute 
'test_null_conversion1'
...
```

```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowT' 
--python-executables=python
```

```
...
AttributeError: 'module' object has no attribute 'ArrowT'
...
```

```bash
 ./run-tests --testnames 'pyspark.sql.tests.test_ar' --python-executables=python
```
```
...
/.../python2.7: No module named pyspark.sql.tests.test_ar
```

Closes #23203 from HyukjinKwon/SPARK-26252.

Authored-by: Hyukjin Kwon <gurwls...@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7e3eb3cd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7e3eb3cd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7e3eb3cd

Branch: refs/heads/master
Commit: 7e3eb3cd209d83394ca2b2cec79b26b1bbe9d7ea
Parents: 7143e9d
Author: Hyukjin Kwon <gurwls...@apache.org>
Authored: Wed Dec 5 15:22:08 2018 +0800
Committer: Hyukjin Kwon <gurwls...@apache.org>
Committed: Wed Dec 5 15:22:08 2018 +0800

----------------------------------------------------------------------
 python/run-tests-with-coverage |  2 --
 python/run-tests.py            | 68 +++++++++++++++++++++++++------------
 2 files changed, 46 insertions(+), 24 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/7e3eb3cd/python/run-tests-with-coverage
----------------------------------------------------------------------
diff --git a/python/run-tests-with-coverage b/python/run-tests-with-coverage
index 6d74b56..4578210 100755
--- a/python/run-tests-with-coverage
+++ b/python/run-tests-with-coverage
@@ -50,8 +50,6 @@ export SPARK_CONF_DIR="$COVERAGE_DIR/conf"
 # This environment variable enables the coverage.
 export COVERAGE_PROCESS_START="$FWDIR/.coveragerc"
 
-# If you'd like to run a specific unittest class, you could do such as
-# SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests
 ./run-tests "$@"
 
 # Don't run coverage for the coverage command itself

http://git-wip-us.apache.org/repos/asf/spark/blob/7e3eb3cd/python/run-tests.py
----------------------------------------------------------------------
diff --git a/python/run-tests.py b/python/run-tests.py
index 01a6e81..e45268c 100755
--- a/python/run-tests.py
+++ b/python/run-tests.py
@@ -19,7 +19,7 @@
 
 from __future__ import print_function
 import logging
-from optparse import OptionParser
+from optparse import OptionParser, OptionGroup
 import os
 import re
 import shutil
@@ -99,7 +99,7 @@ def run_individual_python_test(target_dir, test_name, 
pyspark_python):
     try:
         per_test_output = tempfile.TemporaryFile()
         retcode = subprocess.Popen(
-            [os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
+            [os.path.join(SPARK_HOME, "bin/pyspark")] + test_name.split(),
             stderr=per_test_output, stdout=per_test_output, env=env).wait()
         shutil.rmtree(tmp_dir, ignore_errors=True)
     except:
@@ -190,6 +190,20 @@ def parse_opts():
         help="Enable additional debug logging"
     )
 
+    group = OptionGroup(parser, "Developer Options")
+    group.add_option(
+        "--testnames", type="string",
+        default=None,
+        help=(
+            "A comma-separated list of specific modules, classes and functions 
of doctest "
+            "or unittest to test. "
+            "For example, 'pyspark.sql.foo' to run the module as unittests or 
doctests, "
+            "'pyspark.sql.tests FooTests' to run the specific class of 
unittests, "
+            "'pyspark.sql.tests FooTests.test_foo' to run the specific 
unittest in the class. "
+            "'--modules' option is ignored if they are given.")
+    )
+    parser.add_option_group(group)
+
     (opts, args) = parser.parse_args()
     if args:
         parser.error("Unsupported arguments: %s" % ' '.join(args))
@@ -213,25 +227,31 @@ def _check_coverage(python_exec):
 
 def main():
     opts = parse_opts()
-    if (opts.verbose):
+    if opts.verbose:
         log_level = logging.DEBUG
     else:
         log_level = logging.INFO
+    should_test_modules = opts.testnames is None
     logging.basicConfig(stream=sys.stdout, level=log_level, 
format="%(message)s")
     LOGGER.info("Running PySpark tests. Output is in %s", LOG_FILE)
     if os.path.exists(LOG_FILE):
         os.remove(LOG_FILE)
     python_execs = opts.python_executables.split(',')
-    modules_to_test = []
-    for module_name in opts.modules.split(','):
-        if module_name in python_modules:
-            modules_to_test.append(python_modules[module_name])
-        else:
-            print("Error: unrecognized module '%s'. Supported modules: %s" %
-                  (module_name, ", ".join(python_modules)))
-            sys.exit(-1)
     LOGGER.info("Will test against the following Python executables: %s", 
python_execs)
-    LOGGER.info("Will test the following Python modules: %s", [x.name for x in 
modules_to_test])
+
+    if should_test_modules:
+        modules_to_test = []
+        for module_name in opts.modules.split(','):
+            if module_name in python_modules:
+                modules_to_test.append(python_modules[module_name])
+            else:
+                print("Error: unrecognized module '%s'. Supported modules: %s" 
%
+                      (module_name, ", ".join(python_modules)))
+                sys.exit(-1)
+        LOGGER.info("Will test the following Python modules: %s", [x.name for 
x in modules_to_test])
+    else:
+        testnames_to_test = opts.testnames.split(',')
+        LOGGER.info("Will test the following Python tests: %s", 
testnames_to_test)
 
     task_queue = Queue.PriorityQueue()
     for python_exec in python_execs:
@@ -246,16 +266,20 @@ def main():
         LOGGER.debug("%s python_implementation is %s", python_exec, 
python_implementation)
         LOGGER.debug("%s version is: %s", python_exec, subprocess_check_output(
             [python_exec, "--version"], stderr=subprocess.STDOUT, 
universal_newlines=True).strip())
-        for module in modules_to_test:
-            if python_implementation not in 
module.blacklisted_python_implementations:
-                for test_goal in module.python_test_goals:
-                    heavy_tests = ['pyspark.streaming.tests', 
'pyspark.mllib.tests',
-                                   'pyspark.tests', 'pyspark.sql.tests', 
'pyspark.ml.tests']
-                    if any(map(lambda prefix: test_goal.startswith(prefix), 
heavy_tests)):
-                        priority = 0
-                    else:
-                        priority = 100
-                    task_queue.put((priority, (python_exec, test_goal)))
+        if should_test_modules:
+            for module in modules_to_test:
+                if python_implementation not in 
module.blacklisted_python_implementations:
+                    for test_goal in module.python_test_goals:
+                        heavy_tests = ['pyspark.streaming.tests', 
'pyspark.mllib.tests',
+                                       'pyspark.tests', 'pyspark.sql.tests', 
'pyspark.ml.tests']
+                        if any(map(lambda prefix: 
test_goal.startswith(prefix), heavy_tests)):
+                            priority = 0
+                        else:
+                            priority = 100
+                        task_queue.put((priority, (python_exec, test_goal)))
+        else:
+            for test_goal in testnames_to_test:
+                task_queue.put((0, (python_exec, test_goal)))
 
     # Create the target directory before starting tasks to avoid races.
     target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 
'target'))


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to