[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

marmbrus Thu, 22 Sep 2016 18:42:07 -0700

Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/15102
  
    Comparable requirement removed in #15207.
    
    > I think in the absence of prior information about the position in a 
topicpartition, you start a new batch on topic B starting from wherever the 
consumer's position was at the time it acquired the subscription, which might 
not be 0. I.e. you call position() before seekToEnd().
    
    Why do you care when it acquired it?  If it appeared in-between the the 
last batch and now, don't you want to consume all of the available data from 
it?  Otherwise the answer is going to depend on the specifics on when you see 
the topic, which seems counter to the model of Structured Streaming.
    
    > I think the main thing that would be confusing is to specify topics in 
one way (custom-delimited string) for one configuration, and in another way 
(structured json) for another configuration.
    
    Are you proposing users have to type `"[\"topic1\", \"topic2\"]` (or pull 
in a json library) instead of `"topic1,topic2"`?  Seems we could pretty 
seamlessly add support for JSON in the future, while still making the common 
case easy to type.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

Reply via email to