Hi maqy,    The exception is occurd by connection closed,one of reasons is 
datanode side timeout if We have not find problem In spark before the 
exception.So We could try to find more clues In datanode log.        Best 
wishes,    Jinxin xiaoxingstack 邮箱:xiaoxingst...@gmail.com 签名由 网易邮箱大师 定制 
在2020年04月22日 23:40,maqy 写道:     Hi Jinxin, spark web ui shows that all tasks 
are completed successfully, this error appears in the shell: 
java.io.EOFException: Premature EOF: no length prefix available     at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:244)     
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
     at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStream$ResponseProcessor.run(DFSOutputStream.java:733)
 More information can be seen here: 
https://stackoverflow.com/questions/61202566/spark-sql-datasetrow-collect-to-driver-throw-java-io-eofexception-premature-e
   I speculate that there is a problem with deserialization, because after the 
web ui shows that the tasks of collect() are completed, the memory occupied by 
the “spark submit” process is still increasing. After a few minutes, the memory 
usage will no longer increase, and after a few minutes, the shell will report 
this error.   Best regards, maqy   发件人: Tang Jinxin 发送时间: 2020年4月22日 23:16 收件人: 
maqy 抄送: user@spark.apache.org 主题: 回复:[Spark SQL] [Beginner] Dataset[Row] 
collect to driver throwjava.io.EOFException: Premature EOF: no length prefix 
available   Maybe datanode stop data transfer due    to timeout.Could you 
please provide exception stack? xiaoxingstack 邮箱:xiaoxingst...@gmail.com 签名由 
网易邮箱大师 定制 在2020年04月22日 19:53,maqy 写道:     Today I meet the same problem using 
rdd.collect (), the format of rdd is Tuple2 [Int, Int]. And this problem will 
appear when the amount of data reaches about 100GB.     I guess there may be 
something wrong with deserialization. Has anyone else encountered this problem? 
  Best regards, maqy   发件人: maqy1...@outlook.com 发送时间: 2020年4月20日 10:33 收件人: 
user@spark.apache.org 主题: [Spark SQL] [Beginner] Dataset[Row] collect to driver 
throwjava.io.EOFException: Premature EOF: no length prefix available   Hi all, 
I get a Dataset[Row] through the following code:   val df: Dataset[Row] = 
spark.read.format("csv).schema(schema).load("hdfs://master:9000/mydata")   
After that I want to collect it to the driver:   val df_rows: Array[Row] = 
df.collect()   The Spark web ui shows that all tasks have run successfully, but 
the application did not stop. After more than ten minutes, an error will be 
generated in the shell:   java.io.EOFException: Premature EOF: no length prefix 
available   Environment:     Spark 2.4.3     Hadoop 2.7.7     Total rows of 
data about 800,000,000, 12GB         More detailed information can be seen 
here: 
https://stackoverflow.com/questions/61202566/spark-sql-datasetrow-collect-to-driver-throw-java-io-eofexception-premature-e
     Does anyone know the reason?   Best regards, maqy      

Reply via email to