[ 
https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558506#comment-15558506
 ] 

Cody Koeninger commented on SPARK-17812:
----------------------------------------

So I'm willing to do this work, mostly because I've already done it, but there 
are some user interface issues here that need to get figured out.

You already chose the name "startingOffset" for specifying the equivalent of 
auto.offset.reset.  Now we're looking at actually adding starting offsets.  
Furthermore, it should be possible to specify starting offsets for some 
partitions, while relying on the equivalent of auto.offset.reset for other 
unspecified ones (the existing DStream does this).

What are you expecting configuration of this to look like?  I can see a couple 
of options:

1. Try to cram everything into startingOffset with some horrible string-based 
DSL
2. Have a separate option for specifying starting offsets for real, with a name 
that makes it clear what it is, yet doesn't use "startingoffset".  As for the 
value, I guess in json form of some kind?   { "topicfoo" : { "0": 1234, "1": 
4567 }}

Somewhat related is that Assign needs a way of specifying topicpartitions.

As far as the idea to seek back X offsets, I think it'd be better to look at 
offset time indexing.
If you are going to do the X offsets back idea, the offsets -1L and -2L already 
have special meaning, so it's going to be kind of confusing to allow negative 
numbers in an interface that is specifying offsets.


> More granular control of starting offsets
> -----------------------------------------
>
>                 Key: SPARK-17812
>                 URL: https://issues.apache.org/jira/browse/SPARK-17812
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>
> Right now you can only run a Streaming Query starting from either the 
> earliest or latests offsets available at the moment the query is started.  
> Sometimes this is a lot of data.  It would be nice to be able to do the 
> following:
>  - seek back {{X}} offsets in the stream from the moment the query starts
>  - seek to user specified offsets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to