Yikun commented on a change in pull request #32867:
URL: https://github.com/apache/spark/pull/32867#discussion_r654522707



##########
File path: dev/sparktestsupport/modules.py
##########
@@ -58,14 +95,19 @@ def __init__(self, name, dependencies, source_file_regexes, 
build_profile_flags=
             is not explicitly changed.
         :param should_run_r_tests: If true, changes in this module will 
trigger all R tests.
         :param should_run_build_tests: If true, changes in this module will 
trigger build tests.
+        :param python_test_paths: A set of python test paths to be discovered
+        :param python_excluded_tests: A set of excluded Python tests
         """
         self.name = name
         self.dependencies = dependencies
         self.source_file_prefixes = source_file_regexes
         self.sbt_test_goals = sbt_test_goals
         self.build_profile_flags = build_profile_flags
         self.environ = environ or {}
-        self.python_test_goals = python_test_goals
+        discovered_goals = _discover_python_unittests(python_test_paths)
+        # Final goals = Manual specified goals + Discoverd goals - Excluded 
goals
+        all_goals = set(python_test_goals) | set(discovered_goals)
+        self.python_test_goals = sorted(list(set(all_goals) - 
set(python_excluded_tests)))

Review comment:
       Yes, great suggestion, it makes module implementation more clean.
   
   If we want to **only** discover the slow tests
   ```
   python_test_goals = [
       ...
   ] + _discover_python_unittests("pyspark/tests", discover_slow=True)
   ```
   otherwise will skip slow tests.
   
   I complete it in my next commit. : )

##########
File path: dev/sparktestsupport/modules.py
##########
@@ -488,12 +495,10 @@ def __hash__(self):
     python_test_goals=[
         # doctests
         "pyspark.streaming.util",

Review comment:
       Frankly, not yet, but I'd like to start to do some investigation. This 
unittestPR is started when I forgot many times to add the unittests in module. 
^_^
   
   And the `doctests` discover is a bit special, it's not subclass of doctest 
(just like the unitest), so looks like there are 2 potential solution:
   
   1. Flag based. Add some flag like `_doc_test = True` in every test we want 
to test.
   2. Special name based. Discover which module contains `_doctest` method (we 
should rename _test to _doctest to make doc test name unique).

##########
File path: dev/sparktestsupport/modules.py
##########
@@ -488,12 +495,10 @@ def __hash__(self):
     python_test_goals=[
         # doctests
         "pyspark.streaming.util",

Review comment:
       BTW, do you think the footer method (excute doctest or unitests in 
__main__ with repeat code in many files) in python code is a good way enough? 
Do we consider do some re-arch (or to say, already have some work)on testing 
framework?

##########
File path: dev/sparktestsupport/modules.py
##########
@@ -488,12 +495,10 @@ def __hash__(self):
     python_test_goals=[
         # doctests
         "pyspark.streaming.util",

Review comment:
       BTW, I'm not sure here is the good place to discussion. : )
   
   Do you think the footer method (excute doctest or unitests in __main__ with 
repeat code in many files) in python code is a good way enough? Do we consider 
do some re-arch (or to say, already have some work) on testing framework?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to