[jira] [Commented] (SPARK-10815) API design: data sources and sinks

Reynold Xin (JIRA) Mon, 19 Sep 2016 17:57:06 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505186#comment-15505186
 ]


Reynold Xin commented on SPARK-10815:
-------------------------------------

Source depends on DataFrame, which can really only be created using RDDs. We 
are basically depending on a wide end-user facing API just for the sake of 
providing data.


> API design: data sources and sinks
> ----------------------------------
>
>                 Key: SPARK-10815
>                 URL: https://issues.apache.org/jira/browse/SPARK-10815
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Streaming
>            Reporter: Reynold Xin
>
> The existing (in 2.0) source/sink interface for structured streaming depends 
> on RDDs. This dependency has two issues:
> 1. The RDD interface is wide and difficult to stabilize across versions. This 
> is similar to point 1 in https://issues.apache.org/jira/browse/SPARK-15689. 
> Ideally, a source/sink implementation created for Spark 2.x should work in 
> Spark 10.x, assuming the JVM is still around.
> 2. It is difficult to swap in/out a different execution engine.
> The purpose of this ticket is to create a stable interface that addresses the 
> above two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10815) API design: data sources and sinks

Reply via email to