[jira] [Commented] (SPARK-10815) API design: data sources and sinks

Cody Koeninger (JIRA) Wed, 12 Oct 2016 15:59:37 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570139#comment-15570139
 ]


Cody Koeninger commented on SPARK-10815:
----------------------------------------

Another unfortunate thing about the Sink api is that it only exposes batch ids, 
with no way that I'm aware of to get at (e.g. Kafka) offsets.

Access to offsets for sinks that can take advantage of it would be preferable, 
as it's better for disaster recovery and doesn't lock you in to a particular 
streaming engine.

> API design: data sources and sinks
> ----------------------------------
>
>                 Key: SPARK-10815
>                 URL: https://issues.apache.org/jira/browse/SPARK-10815
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Streaming
>            Reporter: Reynold Xin
>
> The existing (in 2.0) source/sink interface for structured streaming depends 
> on RDDs. This dependency has two issues:
> 1. The RDD interface is wide and difficult to stabilize across versions. This 
> is similar to point 1 in https://issues.apache.org/jira/browse/SPARK-15689. 
> Ideally, a source/sink implementation created for Spark 2.x should work in 
> Spark 10.x, assuming the JVM is still around.
> 2. It is difficult to swap in/out a different execution engine.
> The purpose of this ticket is to create a stable interface that addresses the 
> above two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10815) API design: data sources and sinks

Reply via email to