JoshRosen commented on PR #36775: URL: https://github.com/apache/spark/pull/36775#issuecomment-1171817134
> Thanks for you guys reviews, I think @JoshRosen's way is more appropriate, we better change the implementation to "false negatives", though it may need more efforts to implement it and it's a breaking change to the users. > > Do we need to implement it or change this to identify whether it's an exception from `DFSClient.checkOpen` to make the codes more robust? Maybe we can tackle the issues separately? In the long-term I think we should reconsider the overall approach for detecting corrupt files and maybe move towards a "false negatives" design. This would be a breaking change and might require datasource changes, so I think it deserves a separate and broader discussion (maybe on a separate JIRA or on the dev mailing list). In the short-term, I think it's okay to make an incremental improvement to the current design and fix the specific issue targeted by this PR. Merging the slightly-more-robust-via-checking-exception-source version of the change you proposed here delivers real value to users and solves a real problem even if it doesn't completely solve the larger pre-existing design problem in this feature. Could you update this PR to include the more robust method of error checking and ping us for a final look? Please add a code comment near the modified code which references SPARK-39389 and explains the motivation for the change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org