Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/20473#discussion_r165445947 --- Diff: python/run-tests.py --- @@ -151,6 +151,38 @@ def parse_opts(): return opts +def _check_dependencies(python_exec, modules_to_test): + if "COVERAGE_PROCESS_START" in os.environ: + # Make sure if coverage is installed. + try: + subprocess_check_output( + [python_exec, "-c", "import coverage"], + stderr=open(os.devnull, 'w')) + except: + print_red("Coverage is not installed in Python executable '%s' " + "but 'COVERAGE_PROCESS_START' environment variable is set, " + "exiting." % python_exec) + sys.exit(-1) + + if pyspark_sql in modules_to_test: + # If we should test 'pyspark-sql', it checks if PyArrow and Pandas are installed and + # explicitly prints out. See SPARK-23300. + try: + subprocess_check_output( + [python_exec, "-c", "import pyarrow"], + stderr=open(os.devnull, 'w')) + except: --- End diff -- Actually, since we are here, is it possible to do the same thing as https://github.com/apache/spark/blob/ec63e2d0743a4f75e1cce21d0fe2b54407a86a4a/python/pyspark/sql/tests.py#L51-L63 and https://github.com/apache/spark/blob/ec63e2d0743a4f75e1cce21d0fe2b54407a86a4a/python/pyspark/sql/tests.py#L78-L84? It will be nice to use the same logic. Otherwise, even we do not print the warning at here, tests may still get skipped because of the version issue.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org