[jira] [Commented] (SPARK-10815) API design: data sources and sinks

Frederick Reiss (JIRA) Mon, 19 Sep 2016 17:11:00 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505102#comment-15505102
 ]


Frederick Reiss commented on SPARK-10815:
-----------------------------------------

I'm confused by the current description of this task.  As far as I can see, the 
interface for sources and sinks in Structured Streaming has no direct 
dependencies on RDDs. More precisely, the four traits (Source, Sink, 
StreamSourceProvider, and StreamSinkProvider) that directly comprise the 
interface do not depend on the {{RDD}} class. The DataStreamWriter and 
DataFrameWriter class that currently insulate users from Source, Sink, etc. 
also do not have any dependencies on {{RDD}}.

Is this issue intended perhaps to reference limitations of the implementations 
of Datasets that require unnecessary direct access to the Dataset's internal 
RDD? 

> API design: data sources and sinks
> ----------------------------------
>
>                 Key: SPARK-10815
>                 URL: https://issues.apache.org/jira/browse/SPARK-10815
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Streaming
>            Reporter: Reynold Xin
>
> The existing (in 2.0) source/sink interface for structured streaming depends 
> on RDDs. This dependency has two issues:
> 1. The RDD interface is wide and difficult to stabilize across versions. This 
> is similar to point 1 in https://issues.apache.org/jira/browse/SPARK-15689. 
> Ideally, a source/sink implementation created for Spark 2.x should work in 
> Spark 10.x, assuming the JVM is still around.
> 2. It is difficult to swap in/out a different execution engine.
> The purpose of this ticket is to create a stable interface that addresses the 
> above two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10815) API design: data sources and sinks

Reply via email to