[jira] [Commented] (SPARK-18682) Batch Source for Kafka

Cody Koeninger (JIRA) Sun, 04 Dec 2016 07:46:54 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15720146#comment-15720146
 ]


Cody Koeninger commented on SPARK-18682:
----------------------------------------

Isn't this a duplicate of https://issues.apache.org/jira/browse/SPARK-18386

Regarding limit, that would need to be a per partition limit, either explicitly 
or implicitly (divide n by number of partitions)

> Batch Source for Kafka
> ----------------------
>
>                 Key: SPARK-18682
>                 URL: https://issues.apache.org/jira/browse/SPARK-18682
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL, Structured Streaming
>            Reporter: Michael Armbrust
>
> Today, you can start a stream that reads from kafka.  However, given kafka's 
> configurable retention period, it seems like sometimes you might just want to 
> read all of the data that is available now.  As such we should add a version 
> that works with {{spark.read}} as well.
> The options should be the same as the streaming kafka source, with the 
> following differences:
>  - {{startingOffsets}} should default to earliest, and should not allow 
> {{latest}} (which would always be empty).
>  - {{endingOffsets}} should also be allowed and should default to {{latest}}. 
> the same assign json format as {{startingOffsets}} should also be accepted.
> It would be really good, if things like {{.limit\(n\)}} were enough to 
> prevent all the data from being read (this might just work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18682) Batch Source for Kafka

Reply via email to