Rafal Wojdyla created SPARK-48711:
-------------------------------------

             Summary: OOM killer may leave SparkContext in broken state causing 
ConnectionRefusedError
                 Key: SPARK-48711
                 URL: https://issues.apache.org/jira/browse/SPARK-48711
             Project: Spark
          Issue Type: Bug
          Components: PySpark, Spark Core
    Affects Versions: 3.5.0
            Reporter: Rafal Wojdyla


Related to https://issues.apache.org/jira/browse/SPARK-18523, and 
https://github.com/apache/spark/pull/15961. I'm currently on:

{code}
pyspark                   3.5.0              pyhd8ed1ab_0    conda-forge
py4j                      0.10.9.7           pyhd8ed1ab_0    conda-forge
{code}

When Spark JVM process gets OOM-Killed, `SparkContext.stop` fails with 
`ConnectionRefusedError`, which leaves the `SparkSession/Context` in a "dirty" 
state. https://issues.apache.org/jira/browse/SPARK-18523 addressed this by 
catching the {{Py4JError}} it looks like the code now raises 
{{ConnectionRefusedError}}:

{code}
Traceback (most recent call last):
  ...
  File "<TRUNC>/lib/python3.11/site-packages/pyspark/sql/session.py", line 
1796, in stop
    self._sc.stop()
  File "<TRUNC>/lib/python3.11/site-packages/pyspark/context.py", line 654, in 
stop
    self._jsc.stop()
  File "<TRUNC>/lib/python3.11/site-packages/py4j/java_gateway.py", line 1321, 
in __call__
    answer = self.gateway_client.send_command(command)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<TRUNC>/lib/python3.11/site-packages/py4j/java_gateway.py", line 1036, 
in send_command
    connection = self._get_connection()
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 284, 
in _get_connection
    connection = self._create_new_connection()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 291, 
in _create_new_connection
    connection.connect_to_java_server()
  File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 438, 
in connect_to_java_server
    self.socket.connect((self.java_address, self.java_port))
ConnectionRefusedError: [Errno 111] Connection refused
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to