This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 227611d70d52 [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script 227611d70d52 is described below commit 227611d70d5293bbb5d67b62af649e3bf36eaec6 Author: Hyukjin Kwon <gurwls...@apache.org> AuthorDate: Tue Jan 23 10:55:01 2024 +0900 [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script ### What changes were proposed in this pull request? This PR cleans up the obsolete code in PySpark coverage script ### Why are the changes needed? We used to use `coverage_daemon.py` for Python workers to track the coverage of the Python worker side (e.g., the coverage within Python UDF), added in https://github.com/apache/spark/pull/20204. However, seems it does not work anymore. In fact, it has been multiple years that it stopped working. The approach of replacing the Python worker itself was a bit hacky workaround. We should just get rid of them first, and find a proper way. This should also deflake the scheduled jobs, and speed up the build. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested via: ```python ./run-tests-with-coverage --python-executables=python3 --testname="pyspark.sql.functions.builtin" ``` ``` ... Finished test(python3): pyspark.sql.tests.test_functions (87s) Tests passed in 87 seconds Combining collected coverage data under ... Creating XML report file at python/coverage.xml Wrote XML report to coverage.xml Reporting the coverage data at /.../spark/python/test_coverage/coverage_data/coverage Name Stmts Miss Branch BrPart Cover ------------------------------------------------------------------------- pyspark/__init__.py 48 7 10 3 76% pyspark/_globals.py 16 3 4 2 75% pyspark/accumulators.py 123 38 26 5 66% pyspark/broadcast.py 121 79 40 3 33% pyspark/conf.py 99 33 50 5 64% pyspark/context.py 451 216 151 26 51% pyspark/errors/__init__.py 3 0 0 0 100% pyspark/errors/error_classes.py 3 0 0 0 100% pyspark/errors/exceptions/__init__.py 0 0 0 0 100% pyspark/errors/exceptions/base.py 91 15 24 4 83% pyspark/errors/exceptions/captured.py 168 81 57 17 48% pyspark/errors/utils.py 34 8 6 2 70% pyspark/files.py 34 15 12 3 57% pyspark/find_spark_home.py 30 24 12 2 19% pyspark/java_gateway.py 114 31 30 12 69% pyspark/join.py 66 58 58 0 6% pyspark/profiler.py 244 182 92 3 22% pyspark/rdd.py 1064 741 378 9 27% pyspark/rddsampler.py 68 50 32 0 18% pyspark/resource/__init__.py 5 0 0 0 100% pyspark/resource/information.py 11 4 4 0 73% pyspark/resource/profile.py 110 82 58 1 27% pyspark/resource/requests.py 139 90 70 0 35% pyspark/resultiterable.py 14 6 2 1 56% pyspark/serializers.py 349 185 90 13 43% pyspark/shuffle.py 397 322 180 1 13% pyspark/sql/__init__.py 14 0 0 0 100% pyspark/sql/catalog.py 203 127 66 2 30% pyspark/sql/column.py 268 78 64 12 67% pyspark/sql/conf.py 40 16 10 3 58% pyspark/sql/context.py 170 95 58 2 47% pyspark/sql/dataframe.py 900 475 459 40 45% pyspark/sql/functions/__init__.py 3 0 0 0 100% pyspark/sql/functions/builtin.py 1741 542 1126 26 76% pyspark/sql/functions/partitioning.py 41 19 18 3 59% pyspark/sql/group.py 81 30 32 3 65% pyspark/sql/observation.py 54 37 22 1 26% pyspark/sql/pandas/__init__.py 1 0 0 0 100% pyspark/sql/pandas/conversion.py 277 249 156 2 8% pyspark/sql/pandas/functions.py 67 49 34 0 18% pyspark/sql/pandas/group_ops.py 89 65 22 2 25% pyspark/sql/pandas/map_ops.py 37 27 10 2 26% pyspark/sql/pandas/serializers.py 381 323 172 0 10% pyspark/sql/pandas/typehints.py 41 32 26 1 15% pyspark/sql/pandas/types.py 407 383 326 1 3% pyspark/sql/pandas/utils.py 29 11 10 5 59% pyspark/sql/profiler.py 80 47 54 1 39% pyspark/sql/readwriter.py 362 253 146 7 27% pyspark/sql/session.py 469 206 228 22 56% pyspark/sql/sql_formatter.py 41 26 16 1 28% pyspark/sql/streaming/__init__.py 4 0 0 0 100% pyspark/sql/streaming/listener.py 400 200 186 1 61% pyspark/sql/streaming/query.py 102 63 40 1 39% pyspark/sql/streaming/readwriter.py 268 207 118 2 21% pyspark/sql/streaming/state.py 100 68 44 0 29% pyspark/sql/tests/__init__.py 0 0 0 0 100% pyspark/sql/tests/test_functions.py 646 2 244 7 99% pyspark/sql/types.py 1013 355 528 74 62% pyspark/sql/udf.py 240 132 90 20 42% pyspark/sql/udtf.py 152 98 52 2 33% pyspark/sql/utils.py 160 83 54 10 45% pyspark/sql/window.py 89 23 56 5 77% pyspark/statcounter.py 79 58 20 0 21% pyspark/status.py 36 13 6 0 55% pyspark/storagelevel.py 41 9 0 0 78% pyspark/taskcontext.py 111 63 40 1 40% pyspark/testing/__init__.py 2 0 0 0 100% pyspark/testing/sqlutils.py 149 44 52 1 75% pyspark/testing/utils.py 312 238 162 2 17% pyspark/traceback_utils.py 38 4 14 6 81% pyspark/util.py 153 120 56 2 18% pyspark/version.py 1 0 0 0 100% ... ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44842 from HyukjinKwon/SPARK-46802. Authored-by: Hyukjin Kwon <gurwls...@apache.org> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- python/run-tests-with-coverage | 3 -- python/test_coverage/conf/spark-defaults.conf | 21 ------------ python/test_coverage/coverage_daemon.py | 48 --------------------------- 3 files changed, 72 deletions(-) diff --git a/python/run-tests-with-coverage b/python/run-tests-with-coverage index d1c2dacbf9d8..aa23e16e8e43 100755 --- a/python/run-tests-with-coverage +++ b/python/run-tests-with-coverage @@ -44,9 +44,6 @@ export PYTHONPATH="$FWDIR:$PYTHONPATH" # Also, our sitecustomize.py and coverage_daemon.py are included in the path. export PYTHONPATH="$COVERAGE_DIR:$PYTHONPATH" -# We use 'spark.python.daemon.module' configuration to insert the coverage supported workers. -export SPARK_CONF_DIR="$COVERAGE_DIR/conf" - # This environment variable enables the coverage. export COVERAGE_PROCESS_START="$FWDIR/.coveragerc" diff --git a/python/test_coverage/conf/spark-defaults.conf b/python/test_coverage/conf/spark-defaults.conf deleted file mode 100644 index bf44ea6e7cfe..000000000000 --- a/python/test_coverage/conf/spark-defaults.conf +++ /dev/null @@ -1,21 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -# This is used to generate PySpark coverage results. Seems there's no way to -# add a configuration when SPARK_TESTING environment variable is set because -# we will directly execute modules by python -m. -spark.python.daemon.module coverage_daemon diff --git a/python/test_coverage/coverage_daemon.py b/python/test_coverage/coverage_daemon.py deleted file mode 100644 index 4372135d6fc3..000000000000 --- a/python/test_coverage/coverage_daemon.py +++ /dev/null @@ -1,48 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -import os -import imp -import platform - - -# This is a hack to always refer the main code rather than built zip. -main_code_dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__))) -daemon = imp.load_source("daemon", "%s/pyspark/daemon.py" % main_code_dir) - -if "COVERAGE_PROCESS_START" in os.environ: - # PyPy with coverage makes the tests flaky, and CPython is enough for coverage report. - if "pypy" not in platform.python_implementation().lower(): - worker = imp.load_source("worker", "%s/pyspark/worker.py" % main_code_dir) - - def _cov_wrapped(*args, **kwargs): - import coverage - cov = coverage.coverage( - config_file=os.environ["COVERAGE_PROCESS_START"]) - cov.start() - try: - worker.main(*args, **kwargs) - finally: - cov.stop() - cov.save() - daemon.worker_main = _cov_wrapped -else: - raise RuntimeError("COVERAGE_PROCESS_START environment variable is not set, exiting.") - - -if __name__ == '__main__': - daemon.manager() --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org