[ 
https://issues.apache.org/jira/browse/SPARK-33143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215249#comment-17215249
 ] 

Miklos Szurap commented on SPARK-33143:
---------------------------------------

It has been observed with big RDDs.
{noformat}
20/10/07 18:27:20 INFO scheduler.DAGScheduler: Job 311 finished: toPandas at 
/data/1/app/bin/apps/report/doreport.py:91, took 0.619208 s
Exception in thread "serve-DataFrame" java.net.SocketTimeoutException: Accept 
timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
        at java.net.ServerSocket.implAccept(ServerSocket.java:545)
        at java.net.ServerSocket.accept(ServerSocket.java:513)
        at 
org.apache.spark.api.python.PythonServer$$anon$1.run(PythonRDD.scala:881)
Traceback (most recent call last):
  File "/data/1/app/bin/apps/report/doreport.py", line 91, in <module>
    df=dq_final.toPandas()
  File 
"/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/sql/dataframe.py",
 line 2142, in toPandas
  File 
"/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/sql/dataframe.py",
 line 534, in collect
  File 
"/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/rdd.py",
 line 144, in _load_from_socket
  File 
"/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/java_gateway.py",
 line 178, in local_connect_and_auth
Exception: could not open socket: ["tried to connect to ('127.0.0.1', 33127), 
but an error occured: "]
20/10/07 18:27:36 INFO spark.SparkContext: Invoking stop() from shutdown hook
{noformat}
After splitting the app to two parts to process half the data amount in a 
single run it could finish successfully.

> Make SocketAuthServer socket timeout configurable
> -------------------------------------------------
>
>                 Key: SPARK-33143
>                 URL: https://issues.apache.org/jira/browse/SPARK-33143
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.7, 3.0.1
>            Reporter: Miklos Szurap
>            Priority: Major
>
> In SPARK-21551 the socket timeout for the Pyspark applications has been 
> increased from 3 to 15 seconds. However it is still hardcoded.
> In certain situations even the 15 seconds is not enough, so it should be made 
> configurable. 
> This is requested after seeing it in real-life workload failures.
> Also it has been suggested and requested in an earlier comment in 
> [SPARK-18649|https://issues.apache.org/jira/browse/SPARK-18649?focusedCommentId=16493498&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16493498]
> In 
> Spark 2.4 it is under
> [PythonRDD.scala|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L899]
> in Spark 3.x the code has been moved to
> [SocketAuthServer.scala|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/security/SocketAuthServer.scala#L51]
> {code}
> serverSocket.setSoTimeout(15000)
> {code}
> Please include this in both 2.4 and 3.x branches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to