This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push: new f2bcc93 [SPARK-32812][PYTHON][TESTS] Avoid initiating a process during the main process for run-tests.py f2bcc93 is described below commit f2bcc9349d86be71dba491b8348ac8d83f0764a8 Author: itholic <haejoon...@naver.com> AuthorDate: Tue Sep 8 12:22:13 2020 +0900 [SPARK-32812][PYTHON][TESTS] Avoid initiating a process during the main process for run-tests.py ### What changes were proposed in this pull request? In certain environments, seems it fails to run `run-tests.py` script as below: ``` Traceback (most recent call last): File "<string>", line 1, in <module> ... raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. Traceback (most recent call last): ... raise EOFError EOFError ``` The reason is that `Manager.dict()` launches another process when the main process is initiated. It works in most environments for an unknown reason but it should be good to avoid such pattern as guided from Python itself. ### Why are the changes needed? To prevent the test failure for Python. ### Does this PR introduce _any_ user-facing change? No, it fixes a test script. ### How was this patch tested? Manually ran the script after fixing. ``` Running PySpark tests. Output is in /.../python/unit-tests.log Will test against the following Python executables: ['/.../python3', 'python3.8'] Will test the following Python tests: ['pyspark.sql.dataframe'] /.../python3 python_implementation is CPython /.../python3 version is: Python 3.8.5 python3.8 python_implementation is CPython python3.8 version is: Python 3.8.5 Starting test(/.../python3): pyspark.sql.dataframe Starting test(python3.8): pyspark.sql.dataframe Finished test(/.../python3): pyspark.sql.dataframe (33s) Finished test(python3.8): pyspark.sql.dataframe (34s) Tests passed in 34 seconds ``` Closes #29666 from itholic/SPARK-32812. Authored-by: itholic <haejoon...@naver.com> Signed-off-by: HyukjinKwon <gurwls...@apache.org> (cherry picked from commit c8c082ce380b2357623511c6625503fb3f1d65bf) Signed-off-by: HyukjinKwon <gurwls...@apache.org> --- python/run-tests.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/python/run-tests.py b/python/run-tests.py index c34e48a..9a95c96 100755 --- a/python/run-tests.py +++ b/python/run-tests.py @@ -53,7 +53,7 @@ def print_red(text): print('\033[31m' + text + '\033[0m') -SKIPPED_TESTS = Manager().dict() +SKIPPED_TESTS = None LOG_FILE = os.path.join(SPARK_HOME, "python/unit-tests.log") FAILURE_REPORTING_LOCK = Lock() LOGGER = logging.getLogger() @@ -141,6 +141,7 @@ def run_individual_python_test(target_dir, test_name, pyspark_python): skipped_counts = len(skipped_tests) if skipped_counts > 0: key = (pyspark_python, test_name) + assert SKIPPED_TESTS is not None SKIPPED_TESTS[key] = skipped_tests per_test_output.close() except: @@ -293,4 +294,5 @@ def main(): if __name__ == "__main__": + SKIPPED_TESTS = Manager().dict() main() --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org