[jira] [Commented] (SPARK-25390) Data source V2 API refactoring

Jan Berkel (Jira) Mon, 12 Oct 2020 14:19:26 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-25390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212675#comment-17212675
 ]


Jan Berkel commented on SPARK-25390:
------------------------------------

I'm in a similar situation. [~Kyrdan] asked on the mailing list as directed, 
but nobody replied. It's strange that such a central API is completely 
undocumented. The new iteration of the datasource API doesn't look remotely 
like v2, it might as well have been called v3.

If it's not possible to provide the documentation, put at least some 
notes/warnings in the migration guide or changelog indicating that Spark3's 
datasource API has changed completely.

And, as far as I can tell at the moment, it doesn't seem to be possible to 
implement the new Datasource V2 using plain Java classes.

> Data source V2 API refactoring
> ------------------------------
>
>                 Key: SPARK-25390
>                 URL: https://issues.apache.org/jira/browse/SPARK-25390
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Wenchen Fan
>            Assignee: Wenchen Fan
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently it's not very clear how we should abstract data source v2 API. The 
> abstraction should be unified between batch and streaming, or similar but 
> have a well-defined difference between batch and streaming. And the 
> abstraction should also include catalog/table.
> An example of the abstraction:
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}
> We should refactor the data source v2 API according to the abstraction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25390) Data source V2 API refactoring

Reply via email to