[ 
https://issues.apache.org/jira/browse/SPARK-23202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-23202:
-----------------------------------
    Description: 
The current DataSourceWriter API makes it hard to implement 
{{onTaskCommit(taskCommit: TaskCommitMessage)}} in {{FileCommitProtocol}}.
 In general, on receiving commit message, driver can start processing 
messages(e.g. persist messages into files) before all the messages are 
collected.

The proposal to add a new API:
 {{add(WriterCommitMessage message)}}: Handles a commit message on receiving 
from a successful data writer.

This should make the whole API of DataSourceWriter compatible with 
{{FileCommitProtocol}}, and more flexible.

There was another radical attempt in 
[#20386|https://github.com/apache/spark/pull/20386]. Creating a new API as 
[#20454|https://github.com/apache/spark/pull/20454] is more reasonable.

  was:
The current DataSourceWriter API makes it hard to implement 
{{onTaskCommit(taskCommit: TaskCommitMessage)}} in {{FileCommitProtocol}}.
In general, on receiving commit message, driver can start processing 
messages(e.g. persist messages into files) before all the messages are 
collected.

The proposal to add a new API:
{{add(WriterCommitMessage message)}}: Handles a commit message on receiving 
from a successful data writer.

This should make the whole API of DataSourceWriter compatible with 
{{FileCommitProtocol}}, and more flexible.

There was another radical attempt in 
[#20386|https://github.com/apache/spark/pull/20386]. Creating a new API is more 
reasonable.


> Add new API in DataSourceWriter: onDataWriterCommit
> ---------------------------------------------------
>
>                 Key: SPARK-23202
>                 URL: https://issues.apache.org/jira/browse/SPARK-23202
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> The current DataSourceWriter API makes it hard to implement 
> {{onTaskCommit(taskCommit: TaskCommitMessage)}} in {{FileCommitProtocol}}.
>  In general, on receiving commit message, driver can start processing 
> messages(e.g. persist messages into files) before all the messages are 
> collected.
> The proposal to add a new API:
>  {{add(WriterCommitMessage message)}}: Handles a commit message on receiving 
> from a successful data writer.
> This should make the whole API of DataSourceWriter compatible with 
> {{FileCommitProtocol}}, and more flexible.
> There was another radical attempt in 
> [#20386|https://github.com/apache/spark/pull/20386]. Creating a new API as 
> [#20454|https://github.com/apache/spark/pull/20454] is more reasonable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to