Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148227459 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -34,11 +34,17 @@ /** * Proceed to next record, returns false if there is no more records. + * + * If an exception was thrown, the corresponding Spark task would fail and get retried until + * hitting the maximum retry times. */ boolean next(); /** * Return the current record. This method should return same value until `next` is called. + * --- End diff -- -assuming that the source data is not changed in any way. I'd avoid making any comments about what happens then. Maybe make that a broader requirement upfront: Spark assumes that the data does not change in size/structure/value during the query. If it does, any operation may raise an exception or return invalid/inconsistent data. That makes for a nice disclaimer: if it's a database with the right ACID level, updates to a source may not be visible. If it's a CSV file, nobody knows what will happen.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org