Daniel Strassler created SPARK-11211:
----------------------------------------

             Summary: Kafka - offsetOutOfRange forces to largest
                 Key: SPARK-11211
                 URL: https://issues.apache.org/jira/browse/SPARK-11211
             Project: Spark
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.5.1, 1.3.1
            Reporter: Daniel Strassler


This problem relates to how DStreams using the Direct Approach of connecting to 
a Kafka topic behave when they request an offset that does not exist on the 
topic.  Currently it appears the "auto.offset.reset" configuration value is 
being ignored and the default value of “largest” is always being used.
 
When using the Direct Approach of connecting to a Kafka topic using a DStream, 
even if you have the Kafka configuration "auto.offset.reset" set to smallest, 
the behavior in the event of a kafka.common.OffsetOutOfRangeException exception 
is to move the next offset to be consumed value to the largest value on the 
Kafka topic.  It appears that the exception is being eaten and not propagated 
up to the driver as well, so a work around triggered by the propagation of the 
error can not be implemented either.
 
The current behavior of setting to largest means that any data on the Kafka 
topic at the time of the exception being thrown is skipped(lost) to consumption 
and only data produced to the topic after the exception will be consumed.  Two 
possible fixes are listed below.
 
Fix 1:  When “auto.offset.reset" is set to “smallest”, the DStream should set 
the next consumed offset to be the smallest offset value on the Kafka topic.
 
Fix 2:  Propagate the error to the Driver to allow it to react as it deems 
appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to