[GitHub] [spark] juliuszsompolski commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-15 Thread via GitHub


juliuszsompolski commented on PR #42908:
URL: https://github.com/apache/spark/pull/42908#issuecomment-1721461319

   @LuciferYang I tried looking at 
https://github.com/apache/spark/pull/42560#issuecomment-1718968002 but did not 
reproduce it yet. If you have more instances of CI runs where it failed with 
that stack overflow, that could be useful.
   Inspecting the code, I don't see how that iterator could get looped like 
that...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] juliuszsompolski commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-15 Thread via GitHub


juliuszsompolski commented on PR #42908:
URL: https://github.com/apache/spark/pull/42908#issuecomment-1721459768

   @dongjoon-hyun I don't think the SparkConnectSessionHolderSuite failures are 
related, and I don't know what's going on there.
   ```
   Streaming foreachBatch worker is starting with url 
sc://localhost:15002/;user_id=testUser and sessionId 
9863bb98-6682-43ad-bc86-b32d8486fb47.
   Traceback (most recent call last):
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
 line 27, in require_minimum_pandas_version
   import pandas
   ModuleNotFoundError: No module named 'pandas'
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
 File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 
194, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 
87, in _run_code
   exec(code, run_globals)
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py",
 line 86, in 
   main(sock_file, sock_file)
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py",
 line 51, in main
   spark_connect_session = 
SparkSession.builder.remote(connect_url).getOrCreate()
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/session.py", 
line 464, in getOrCreate
   from pyspark.sql.connect.session import SparkSession as 
RemoteSparkSession
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/session.py",
 line 19, in 
   check_dependencies(__name__)
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/utils.py",
 line 33, in check_dependencies
   require_minimum_pandas_version()
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
 line 34, in require_minimum_pandas_version
   raise ImportError(
   ImportError: Pandas >= 1.0.5 must be installed; however, it was not found.
   [info] - python foreachBatch process: process terminates after query is 
stopped *** FAILED *** (1 second, 115 milliseconds)
   
   Streaming query listener worker is starting with url 
sc://localhost:15002/;user_id=testUser and sessionId 
ab6cfcde-a9f1-4b96-8ca3-7aab5c6ff438.
   Traceback (most recent call last):
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
 line 27, in require_minimum_pandas_version
   import pandas
   ModuleNotFoundError: No module named 'pandas'
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
 File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 
194, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 
87, in _run_code
   exec(code, run_globals)
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/listener_worker.py",
 line 99, in 
   main(sock_file, sock_file)
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/streaming/worker/listener_worker.py",
 line 59, in main
   spark_connect_session = 
SparkSession.builder.remote(connect_url).getOrCreate()
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/session.py", 
line 464, in getOrCreate
   from pyspark.sql.connect.session import SparkSession as 
RemoteSparkSession
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/session.py",
 line 19, in 
   check_dependencies(__name__)
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/connect/utils.py",
 line 33, in check_dependencies
   require_minimum_pandas_version()
 File 
"/home/runner/work/apache-spark/apache-spark/python/pyspark/sql/pandas/utils.py",
 line 34, in require_minimum_pandas_version
   raise ImportError(
   ImportError: Pandas >= 1.0.5 must be installed; however, it was not found.
   [info] - python listener process: process terminates after listener is 
removed *** FAILED *** (434 milliseconds)
   [info]   java.io.EOFException:
   ```
   it looks to me like some (intermittent?) environment issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h

[GitHub] [spark] juliuszsompolski commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-13 Thread via GitHub


juliuszsompolski commented on PR #42908:
URL: https://github.com/apache/spark/pull/42908#issuecomment-1717719129

   cc @hvanhovell @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org