[jira] [Commented] (SPARK-25109) spark python should retry reading another datanode if the first one fails to connect

Hyukjin Kwon (JIRA) Tue, 14 Aug 2018 21:00:09 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580692#comment-16580692
 ]


Hyukjin Kwon commented on SPARK-25109:
--------------------------------------

It should be helpful if we can narrow down this problem.

> spark python should retry reading another datanode if the first one fails to 
> connect
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-25109
>                 URL: https://issues.apache.org/jira/browse/SPARK-25109
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.1
>            Reporter: Yuanbo Liu
>            Priority: Major
>         Attachments: 
> WeChatWorkScreenshot_86b5cccc-1d19-430a-a138-335e4bd3211c.png
>
>
> We use this code to read parquet files from HDFS:
> spark.read.parquet('xxx')
> and get error as below:
> !WeChatWorkScreenshot_86b5cccc-1d19-430a-a138-335e4bd3211c.png!
>  
> What we can get is that one of the replica block cannot be read for some 
> reason, but spark python doesn't try to read another replica which can be 
> read successfully. So the application fails after throwing exception.
> When I use hadoop fs -text to read the file, I can get content correctly. It 
> would be great that spark python can retry reading another replica block 
> instead of failing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25109) spark python should retry reading another datanode if the first one fails to connect

Reply via email to