Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19623#discussion_r148227459
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java 
---
    @@ -34,11 +34,17 @@
     
       /**
        * Proceed to next record, returns false if there is no more records.
    +   *
    +   * If an exception was thrown, the corresponding Spark task would fail 
and get retried until
    +   * hitting the maximum retry times.
        */
       boolean next();
     
       /**
        * Return the current record. This method should return same value until 
`next` is called.
    +   *
    --- End diff --
    
    -assuming that the source data is not changed in any way. I'd avoid making 
any comments about what happens then. Maybe make that a broader requirement 
upfront: Spark assumes that the data does not change in size/structure/value 
during the query. If it does, any operation may raise an exception or return 
invalid/inconsistent data.
    
    That makes for a nice disclaimer: if it's a database with the right ACID 
level, updates to a source may not be visible. If it's a CSV file, nobody knows 
what will happen.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to