[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

cloud-fan Wed, 31 Jan 2018 09:37:28 -0800

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20386
  
    There is a lesson I learned from streaming data source v1: even it's 
totally internal, there are people already using it and ask us to not remove 
the API.
    
    I think it's also true for the file-based data source. It's internal but 
people may still use it. Although we don't find any use case for `onTaskCommit` 
among built-in data sources, it may be required by external data sources.
    
    One possible use case might be, the implementation needs a 2-phase commit 
at the driver side. Then it can use `onTaskCommit` to finish the first phase 
earlier. Or maybe someone wanna collect the received commit messages so far and 
report statistics regularly, then he needs the `onTaskCommit`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

Reply via email to