[
https://issues.apache.org/jira/browse/SPARK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
L. C. Hsieh updated SPARK-34726:
Description:
One of our customers frequently encounters "serve-DataFrame"
java.net.SocketTimeoutException: Accept timed errors in PySpark because
DataSet.collectToPython() in Spark 2.4 does the following:
# Collects the results
# Opens up a socket server that is then listening to the connection from
Python side
# Runs the event listeners as part of withAction on the same thread as
SPARK-25680 is not available in Spark 2.4
# Returns the address of the socket server to Python
# The Python side connects to the socket server and fetches the data
As the customer has a custom, long running event listener the time between 2.
and 5. is frequently longer than the default connection timeout and increasing
the connect timeout is not a good solution as we don't know how long running
the listeners can take.
> Fix collectToPython timeouts
>
>
> Key: SPARK-34726
> URL: https://issues.apache.org/jira/browse/SPARK-34726
> Project: Spark
> Issue Type: Bug
> Components: SQL
>Affects Versions: 2.4.7
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 2.4.8
>
>
> One of our customers frequently encounters "serve-DataFrame"
> java.net.SocketTimeoutException: Accept timed errors in PySpark because
> DataSet.collectToPython() in Spark 2.4 does the following:
> # Collects the results
> # Opens up a socket server that is then listening to the connection from
> Python side
> # Runs the event listeners as part of withAction on the same thread as
> SPARK-25680 is not available in Spark 2.4
> # Returns the address of the socket server to Python
> # The Python side connects to the socket server and fetches the data
> As the customer has a custom, long running event listener the time between 2.
> and 5. is frequently longer than the default connection timeout and
> increasing the connect timeout is not a good solution as we don't know how
> long running the listeners can take.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org