[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

koeninger Tue, 27 Sep 2016 14:09:36 -0700

Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/15102
  
    Ok, so this kind of thing is why I was concerned about the copy, paste, 
randomly change things approach to developing this module.
    
    > (5) Topics are deleted when a Spark job is runinng, which may cause 
OffsetOutOfRangeException. (I'm not sure if there are more types of exceptions, 
may need to investigate) Solution: log a warning. Note: if a Spark job fails, 
then the query will fail as well.
    
    OffsetOutOfRangeException basically means you asked Kafka for an offset, 
and it wasn't there.  The most common reason this happens isn't because a topic 
got deleted, it's because messages expired out of retention before they got 
read.
    
    Just logging at warning level and continuing in this situation is 
catastrophically, someone-loses-their-paying-job-not-their-spark-job, bad.
    
    The existing kafka DStream integrations that have been around for 7 spark 
versions will just let that exception be thrown, resulting in errors / failed 
tasks, which make it pretty obvious that something is really wrong.
    
    If you think that behavior is incorrect, let's figure out a unified 
behavior for how to deal with exceptional situations that break fundamental 
assumptions, and make it reallllly obvious to users how to get the behavior 
they need across both modules.  But having the structured stream behave in 
significantly different ways seems like a recipe for trouble.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

Reply via email to