[ https://issues.apache.org/jira/browse/SPARK-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505102#comment-15505102 ]
Frederick Reiss commented on SPARK-10815: ----------------------------------------- I'm confused by the current description of this task. As far as I can see, the interface for sources and sinks in Structured Streaming has no direct dependencies on RDDs. More precisely, the four traits (Source, Sink, StreamSourceProvider, and StreamSinkProvider) that directly comprise the interface do not depend on the {{RDD}} class. The DataStreamWriter and DataFrameWriter class that currently insulate users from Source, Sink, etc. also do not have any dependencies on {{RDD}}. Is this issue intended perhaps to reference limitations of the implementations of Datasets that require unnecessary direct access to the Dataset's internal RDD? > API design: data sources and sinks > ---------------------------------- > > Key: SPARK-10815 > URL: https://issues.apache.org/jira/browse/SPARK-10815 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming > Reporter: Reynold Xin > > The existing (in 2.0) source/sink interface for structured streaming depends > on RDDs. This dependency has two issues: > 1. The RDD interface is wide and difficult to stabilize across versions. This > is similar to point 1 in https://issues.apache.org/jira/browse/SPARK-15689. > Ideally, a source/sink implementation created for Spark 2.x should work in > Spark 10.x, assuming the JVM is still around. > 2. It is difficult to swap in/out a different execution engine. > The purpose of this ticket is to create a stable interface that addresses the > above two. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org