[ 
https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573341#comment-15573341
 ] 

Ofir Manor commented on SPARK-17812:
------------------------------------

Thanks Cody! great to have a concrete example.
I've some comments, but its mostly bikeshedding
1.  subscribe vs. subscribePattern --> personally, I would combine them both to 
"subscribe" - no need to burden the user with the different Kafka API nuances. 
It can get a list of discreet topics or a pattern.
2. It would be much clearer if "assign" was called subscribeSomething, so the 
user would choose one "subscribe.." and one (or more) "starting...".
Not sure I have a good name though - subscribeCustom?
You can even use the regular subscribe for that (and be smarter with the 
pattern matching) - I think it would just work, and if someone tries to be 
funny (combine astrerix and partitions) we could just error
3. I like startingTime... pretty neat.
We could hypothetically add {{.option("startingMessages", long)}} to support 
Michael's "just start with a 1000 recent messages"...
4. As I said before, I'd rather have all starting* be mutual-exclusive. Yes, it 
blocks some edge cases, on purpose,  but make the API and code way clearer 
(think about startingMessage interacting with startingOffsets etc).
I think that it would be easier to regret and allow multiple starting* in the 
future (opening all sorts of esoteric combinations) than clean it up in the 
future if users find it confusing and not needed.
Anyway, as long as it is functional I'm good with it, even if it less aesthetic.

> More granular control of starting offsets (assign)
> --------------------------------------------------
>
>                 Key: SPARK-17812
>                 URL: https://issues.apache.org/jira/browse/SPARK-17812
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>
> Right now you can only run a Streaming Query starting from either the 
> earliest or latests offsets available at the moment the query is started.  
> Sometimes this is a lot of data.  It would be nice to be able to do the 
> following:
>  - seek to user specified offsets for manually specified topicpartitions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to