[ https://issues.apache.org/jira/browse/SPARK-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580692#comment-16580692 ]
Hyukjin Kwon commented on SPARK-25109: -------------------------------------- It should be helpful if we can narrow down this problem. > spark python should retry reading another datanode if the first one fails to > connect > ------------------------------------------------------------------------------------ > > Key: SPARK-25109 > URL: https://issues.apache.org/jira/browse/SPARK-25109 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.1 > Reporter: Yuanbo Liu > Priority: Major > Attachments: > WeChatWorkScreenshot_86b5cccc-1d19-430a-a138-335e4bd3211c.png > > > We use this code to read parquet files from HDFS: > spark.read.parquet('xxx') > and get error as below: > !WeChatWorkScreenshot_86b5cccc-1d19-430a-a138-335e4bd3211c.png! > > What we can get is that one of the replica block cannot be read for some > reason, but spark python doesn't try to read another replica which can be > read successfully. So the application fails after throwing exception. > When I use hadoop fs -text to read the file, I can get content correctly. It > would be great that spark python can retry reading another replica block > instead of failing. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org