Randall Hauch created KAFKA-7873:
------------------------------------

             Summary: KafkaBasedLog's consumer should always seek to beginning 
when starting
                 Key: KAFKA-7873
                 URL: https://issues.apache.org/jira/browse/KAFKA-7873
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect
    Affects Versions: 2.1.0
            Reporter: Randall Hauch
            Assignee: Randall Hauch


KafkaBasedLog expects that callers set the `group.id` for the consumer 
configuration, and does not itself set the `group.id` if the caller does not 
explicitly do so. However, 
[KIP-289|https://cwiki.apache.org/confluence/display/KAFKA/KIP-289%3A+Improve+the+default+group+id+behavior+in+KafkaConsumer]
 changed the default for the `group.id` from a blank string to be null, which 
changes how KafkaBasedLog behaves when no `group.id` is set, and it actually 
deprecates and issues a warning when no `group.id` is specified.

When KafkaBasedLog starts up, it should always start from the beginning of the 
topic and consume to the end. The consumer's logic for where to start is always:
# explicit seek
# committed offset (skipped if group.id is null)
# auto reset behavior

and currently Connect does not explicitly seek to the beginning and instead 
relies upon `auto.offset.reset=earliest`. However, if a `group.id` is specified 
*ant* there are committed offsets, then the consumer will start from the 
committed offsets rather than from the beginning. If a 'group.id' is not 
specified, then the auto reset behavior should work.

However, to avoid the warning and possible exception when no `group.id` is 
specified, KafkaBasedLog should always call {{consumer.seekToBeginning()}} 
during startup. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to