[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

koeninger Thu, 22 Jan 2015 15:39:12 -0800

Github user koeninger commented on the pull request:

    https://github.com/apache/spark/pull/3798#issuecomment-71122072
  
    I need to know, perhaps even at the driver, what the ending offset is in
    order to be able to commit it.
    
    I also have several use cases where I want to end a batch at a specific
    point which may or may not be "now".
    On Jan 22, 2015 5:33 PM, "Hari Shreedharan" <notificati...@github.com>
    wrote:
    
    > OK.
    >
    > Just a thought: Do you think there might be a way to avoid the spikes?
    > Once the current RDD is checkpointed, create a "new" pending RDD, which
    > continuously receives data, until the compute method is called. When
    > compute gets called, the last offset we received can be considered to be
    > the upper bound, and the data is now available for transformations. That
    > way, we could spread out network transfers from Kafka over a larger 
period.
    >
    > Not sure if there are holes in that algorithm, but it looks almost
    > equivalent to the current model, no?
    >
    > â
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/3798#issuecomment-71121466>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

Reply via email to