Github user koeninger commented on the pull request:

    https://github.com/apache/spark/pull/4805#issuecomment-77882744
  
    As it stands now, no offsets are stored by spark unless you're
    checkpointing.  Does it really make sense to have an option to
    automatically store offsets in Kafka, but not store offsets in the
    checkpoint?  Failure recovery in that case depends on user provided
    starting offsets (or starting at the beginning / end of the log).  If
    someone has the sophistication to get offsets from kafka in order to
    provide them as a starting point, they probably have the sophistication to
    save offsets to kafka themselves in the job.
    
    If offsets are only being sent to Kafka when they are also stored in the
    checkpoint, then does sending offsets to kafka in compute() also make
    sense?  Yes, you can lag behind, but those offsets are in the queue to get
    processed at least once.
    
    I'm not 100% sure on the answer to this, its more a question of desired
    behavior, but that's why I brought it up.
    
    
    
    On Mon, Mar 9, 2015 at 12:14 AM, Saisai Shao <notificati...@github.com>
    wrote:
    
    > Hi @koeninger <https://github.com/koeninger> , would you please review
    > this again? Thanks a lot and appreciate your time.
    >
    > Here I still keep using the HashMap for Time -> offset relation mapping,
    > since checkpoint data will only be updated when checkpoint is enabled, I
    > hope this could also be worked even without checkpoint enabled.
    >
    > And I still use StreamingListener to update the offset, the reason is
    > mentioned before.
    >
    > Besides I updated the configuration name, not sure is it suitable.
    >
    > Thanks a lot.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/4805#issuecomment-77801344>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to