[jira] [Updated] (SPARK-34726) Fix collectToPython timeouts

2021-03-22 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-34726:

Description: 
One of our customers frequently encounters "serve-DataFrame" 
java.net.SocketTimeoutException: Accept timed errors in PySpark because 
DataSet.collectToPython() in Spark 2.4 does the following:


# Collects the results
# Opens up a socket server that is then listening to the connection from 
Python side
# Runs the event listeners as part of withAction on the same thread as 
SPARK-25680 is not available in Spark 2.4
# Returns the address of the socket server to Python
# The Python side connects to the socket server and fetches the data

As the customer has a custom, long running event listener the time between 2. 
and 5. is frequently longer than the default connection timeout and increasing 
the connect timeout is not a good solution as we don't know how long running 
the listeners can take.


> Fix collectToPython timeouts
> 
>
> Key: SPARK-34726
> URL: https://issues.apache.org/jira/browse/SPARK-34726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 2.4.8
>
>
> One of our customers frequently encounters "serve-DataFrame" 
> java.net.SocketTimeoutException: Accept timed errors in PySpark because 
> DataSet.collectToPython() in Spark 2.4 does the following:
> # Collects the results
> # Opens up a socket server that is then listening to the connection from 
> Python side
> # Runs the event listeners as part of withAction on the same thread as 
> SPARK-25680 is not available in Spark 2.4
> # Returns the address of the socket server to Python
> # The Python side connects to the socket server and fetches the data
> As the customer has a custom, long running event listener the time between 2. 
> and 5. is frequently longer than the default connection timeout and 
> increasing the connect timeout is not a good solution as we don't know how 
> long running the listeners can take.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34726) Fix collectToPython timeouts

2021-03-12 Thread Peter Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-34726:
---
Affects Version/s: (was: 3.1.1)
   2.4.7

> Fix collectToPython timeouts
> 
>
> Key: SPARK-34726
> URL: https://issues.apache.org/jira/browse/SPARK-34726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7
>Reporter: Peter Toth
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org