[ https://issues.apache.org/jira/browse/SPARK-18682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15720146#comment-15720146 ]
Cody Koeninger commented on SPARK-18682: ---------------------------------------- Isn't this a duplicate of https://issues.apache.org/jira/browse/SPARK-18386 Regarding limit, that would need to be a per partition limit, either explicitly or implicitly (divide n by number of partitions) > Batch Source for Kafka > ---------------------- > > Key: SPARK-18682 > URL: https://issues.apache.org/jira/browse/SPARK-18682 > Project: Spark > Issue Type: New Feature > Components: SQL, Structured Streaming > Reporter: Michael Armbrust > > Today, you can start a stream that reads from kafka. However, given kafka's > configurable retention period, it seems like sometimes you might just want to > read all of the data that is available now. As such we should add a version > that works with {{spark.read}} as well. > The options should be the same as the streaming kafka source, with the > following differences: > - {{startingOffsets}} should default to earliest, and should not allow > {{latest}} (which would always be empty). > - {{endingOffsets}} should also be allowed and should default to {{latest}}. > the same assign json format as {{startingOffsets}} should also be accepted. > It would be really good, if things like {{.limit\(n\)}} were enough to > prevent all the data from being read (this might just work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org