[GitHub] [spark] juliuszsompolski commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

via GitHub Fri, 15 Sep 2023 08:24:00 -0700


juliuszsompolski commented on PR #42908:
URL: https://github.com/apache/spark/pull/42908#issuecomment-1721459768


   @dongjoon-hyun I don't think the SparkConnectSessionHolderSuite failures are 
related, and I don't know what's going on there.
   ```
   Streaming foreachBatch worker is starting with url 
sc://localhost:15002/;user_id=testUser and sessionId 
9863bb98-6682-43ad-bc86-b32d8486fb47.
   Traceback (most recent call last):
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
 line 27, in require_minimum_pandas_version
       import pandas
   ModuleNotFoundError: No module named 'pandas'
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 
194, in _run_module_as_main
       return _run_code(code, main_globals, None,
     File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 
87, in _run_code
       exec(code, run_globals)
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py",
 line 86, in <module>
       main(sock_file, sock_file)
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py",
 line 51, in main
       spark_connect_session = 
SparkSession.builder.remote(connect_url).getOrCreate()
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/session.py", 
line 464, in getOrCreate
       from pyspark.sql.connect.session import SparkSession as 
RemoteSparkSession
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/session.py",
 line 19, in <module>
       check_dependencies(__name__)
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/utils.py",
 line 33, in check_dependencies
       require_minimum_pandas_version()
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
 line 34, in require_minimum_pandas_version
       raise ImportError(
   ImportError: Pandas >= 1.0.5 must be installed; however, it was not found.
   [info] - python foreachBatch process: process terminates after query is 
stopped *** FAILED *** (1 second, 115 milliseconds)
   
   Streaming query listener worker is starting with url 
sc://localhost:15002/;user_id=testUser and sessionId 
ab6cfcde-a9f1-4b96-8ca3-7aab5c6ff438.
   Traceback (most recent call last):
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
 line 27, in require_minimum_pandas_version
       import pandas
   ModuleNotFoundError: No module named 'pandas'
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 
194, in _run_module_as_main
       return _run_code(code, main_globals, None,
     File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 
87, in _run_code
       exec(code, run_globals)
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/listener_worker.py",
 line 99, in <module>
       main(sock_file, sock_file)
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/listener_worker.py",
 line 59, in main
       spark_connect_session = 
SparkSession.builder.remote(connect_url).getOrCreate()
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/session.py", 
line 464, in getOrCreate
       from pyspark.sql.connect.session import SparkSession as 
RemoteSparkSession
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/session.py",
 line 19, in <module>
       check_dependencies(__name__)
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/utils.py",
 line 33, in check_dependencies
       require_minimum_pandas_version()
     File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
 line 34, in require_minimum_pandas_version
       raise ImportError(
   ImportError: Pandas >= 1.0.5 must be installed; however, it was not found.
   [info] - python listener process: process terminates after listener is 
removed *** FAILED *** (434 milliseconds)
   [info]   java.io.EOFException:
   ```
   it looks to me like some (intermittent?) environment issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] juliuszsompolski commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

Reply via email to