juliuszsompolski commented on PR #42908:
URL: https://github.com/apache/spark/pull/42908#issuecomment-1721459768
@dongjoon-hyun I don't think the SparkConnectSessionHolderSuite failures are
related, and I don't know what's going on there.
```
Streaming foreachBatch worker is starting with url
sc://localhost:15002/;user_id=testUser and sessionId
9863bb98-6682-43ad-bc86-b32d8486fb47.
Traceback (most recent call last):
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
line 27, in require_minimum_pandas_version
import pandas
ModuleNotFoundError: No module named 'pandas'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line
194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line
87, in _run_code
exec(code, run_globals)
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py",
line 86, in
main(sock_file, sock_file)
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py",
line 51, in main
spark_connect_session =
SparkSession.builder.remote(connect_url).getOrCreate()
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/session.py",
line 464, in getOrCreate
from pyspark.sql.connect.session import SparkSession as
RemoteSparkSession
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/session.py",
line 19, in
check_dependencies(__name__)
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/utils.py",
line 33, in check_dependencies
require_minimum_pandas_version()
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
line 34, in require_minimum_pandas_version
raise ImportError(
ImportError: Pandas >= 1.0.5 must be installed; however, it was not found.
[info] - python foreachBatch process: process terminates after query is
stopped *** FAILED *** (1 second, 115 milliseconds)
Streaming query listener worker is starting with url
sc://localhost:15002/;user_id=testUser and sessionId
ab6cfcde-a9f1-4b96-8ca3-7aab5c6ff438.
Traceback (most recent call last):
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
line 27, in require_minimum_pandas_version
import pandas
ModuleNotFoundError: No module named 'pandas'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line
194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line
87, in _run_code
exec(code, run_globals)
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/listener_worker.py",
line 99, in
main(sock_file, sock_file)
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/listener_worker.py",
line 59, in main
spark_connect_session =
SparkSession.builder.remote(connect_url).getOrCreate()
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/session.py",
line 464, in getOrCreate
from pyspark.sql.connect.session import SparkSession as
RemoteSparkSession
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/session.py",
line 19, in
check_dependencies(__name__)
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/utils.py",
line 33, in check_dependencies
require_minimum_pandas_version()
File
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
line 34, in require_minimum_pandas_version
raise ImportError(
ImportError: Pandas >= 1.0.5 must be installed; however, it was not found.
[info] - python listener process: process terminates after listener is
removed *** FAILED *** (434 milliseconds)
[info] java.io.EOFException:
```
it looks to me like some (intermittent?) environment issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h