Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20473#discussion_r165350209 --- Diff: python/run-tests.py --- @@ -151,6 +151,38 @@ def parse_opts(): return opts +def _check_dependencies(python_exec, modules_to_test): + if "COVERAGE_PROCESS_START" in os.environ: + # Make sure if coverage is installed. + try: + subprocess_check_output( + [python_exec, "-c", "import coverage"], + stderr=open(os.devnull, 'w')) + except: + print_red("Coverage is not installed in Python executable '%s' " + "but 'COVERAGE_PROCESS_START' environment variable is set, " + "exiting." % python_exec) + sys.exit(-1) + + if pyspark_sql in modules_to_test: + # If we should test 'pyspark-sql', it checks if PyArrow and Pandas are installed and + # explicitly prints out. See SPARK-23300. + try: + subprocess_check_output( + [python_exec, "-c", "import pyarrow"], + stderr=open(os.devnull, 'w')) --- End diff -- Otherwise, it prints out the exception too, for example: ``` Will test the following Python modules: ['pyspark-sql'] Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: No module named foo PyArrow is not installed in Python executable 'python2.7', skipping related tests in 'pyspark-sql'. ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org