[ 
https://issues.apache.org/jira/browse/SPARK-21796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134960#comment-16134960
 ] 

Sean Owen commented on SPARK-21796:
-----------------------------------

OK, but you still likely have some problem if your Python processes are failing 
to read data from a socket when unpickling. Could be mismatched Spark versions, 
packages, maybe not all machines are updated as you think they are, maybe your 
config isn't taking, etc. You seem to have narrowed it down to an env problem, 
right?

> pyspark count failed in python3.5.2
> -----------------------------------
>
>                 Key: SPARK-21796
>                 URL: https://issues.apache.org/jira/browse/SPARK-21796
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.1
>         Environment: Python 3.5.2  Anaconda3 4.2.0
>            Reporter: cen yuhai
>         Attachments: user
>
>
> steps:
> {code}
> pyspark
> user_data = 
> sc.textFile("/data/external_table/ods/table/dt=2017-08-17/hour=01/*.txt")
> user_data.count()
> {code}
> Exceptions:
> {code}
> Caused by: org.apache.spark.api.python.PythonException: Traceback (most 
> recent call last):
>   File "/home/master/platform/spark/python/pyspark/worker.py", line 98, in 
> main
>     command = pickleSer._read_with_length(infile)
>   File "/home/master/platform/spark/python/pyspark/serializers.py", line 164, 
> in _read_with_length
>     return self.loads(obj)
>   File "/home/master/platform/spark/python/pyspark/serializers.py", line 419, 
> in loads
>     return pickle.loads(obj, encoding=encoding)
> EOFError: Ran out of input
>         at 
> org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to