GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/20454

    [SPARK-23202][SQL] Add new DataSourceWriter API: onDataWriterCommit 

    ## What changes were proposed in this pull request?
    
    Currently, the api `DataSourceV2Writer#commit(WriterCommitMessage[])` 
commits a 
    writing job with a list of commit messages.
    
    It makes sense in some scenarios, e.g. MicroBatchExecution.
    
    However, the API makes it hard to implement `onTaskCommit(taskCommit: 
TaskCommitMessage)` in `FileCommitProtocol`.
    In general, on receiving commit message, driver can start processing 
messages(e.g. persist messages into files) before all the messages are 
collected.
    
    The proposal to add a new API:
    `add(WriterCommitMessage message)`:  Handles a commit message on receiving 
from a successful data writer.
    
    This should make the whole API of DataSourceWriter compatible with 
`FileCommitProtocol`, and more flexible.
    
    There was another radical attempt in #20386.  This one should be more 
reasonable.
    
    ## How was this patch tested?
    
    Unit test
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark write_api

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20454
    
----
commit 04edec2221a252ccfbcaf9e505eaae0a0f1664ab
Author: Wang Gengliang <ltnwgl@...>
Date:   2018-01-31T08:21:18Z

    new DataSourceWriter api: onDataWriterCommit

commit 89776eced1b60b1856d6157a30ad1d8be0ba0f81
Author: Wang Gengliang <ltnwgl@...>
Date:   2018-01-31T12:39:13Z

    revise comments and add test case

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to