Dimitrij Denissenko created KAFKA-3251:
------------------------------------------

             Summary: Requesting committed offsets results in inconsistent 
results
                 Key: KAFKA-3251
                 URL: https://issues.apache.org/jira/browse/KAFKA-3251
             Project: Kafka
          Issue Type: Bug
          Components: offset manager
    Affects Versions: 0.9.0.0
            Reporter: Dimitrij Denissenko


Hi,

I am using github.com/Shopify/sarama to retrieve the committed offsets for a 
high-volume topic, but the bug seems to be actually originating in Kafka itself.

I have written a little test to query the offsets of all partitions of one 
topic, every second. The request looks like this:

{code}
OffsetFetchRequest{
  ConsumerGroup: "my-group-name", 
  Version: 1,
  TopicPartitions: []TopicPartition{
     {TopicName: "logs", Partitions: []int32{0,1,2,3,4,5,6,7}
  }
}
{code}

For most of the time, the responses are correct, but every 10 minutes or so, 
there is a little glitch. I am not familiar with the Kafka internals, but it 
looks like a little race. Here's my log output:

{code}
...

2016/02/19 09:48:10 topic=logs partition=00 error=0 offset=206567925
2016/02/19 09:48:10 topic=logs partition=01 error=0 offset=206671019
2016/02/19 09:48:10 topic=logs partition=02 error=0 offset=206567995
2016/02/19 09:48:10 topic=logs partition=03 error=0 offset=205785315
2016/02/19 09:48:10 topic=logs partition=04 error=0 offset=206526677
2016/02/19 09:48:10 topic=logs partition=05 error=0 offset=206713764
2016/02/19 09:48:10 topic=logs partition=06 error=0 offset=206524006
2016/02/19 09:48:10 topic=logs partition=07 error=0 offset=206629121

2016/02/19 09:48:11 topic=logs partition=00 error=0 offset=206572870
2016/02/19 09:48:11 topic=logs partition=01 error=0 offset=206675966
2016/02/19 09:48:11 topic=logs partition=02 error=0 offset=206573267
2016/02/19 09:48:11 topic=logs partition=03 error=0 offset=205790613
2016/02/19 09:48:11 topic=logs partition=04 error=0 offset=206531841
2016/02/19 09:48:11 topic=logs partition=05 error=0 offset=206718513
2016/02/19 09:48:11 topic=logs partition=06 error=0 offset=206529762
2016/02/19 09:48:11 topic=logs partition=07 error=0 offset=206634037

2016/02/19 09:48:12 topic=logs partition=00 error=0 offset=-1
2016/02/19 09:48:12 topic=logs partition=01 error=0 offset=-1
2016/02/19 09:48:12 topic=logs partition=02 error=0 offset=-1
2016/02/19 09:48:12 topic=logs partition=03 error=0 offset=-1
2016/02/19 09:48:12 topic=logs partition=04 error=0 offset=-1
2016/02/19 09:48:12 topic=logs partition=05 error=0 offset=-1
2016/02/19 09:48:12 topic=logs partition=06 error=0 offset=-1
2016/02/19 09:48:12 topic=logs partition=07 error=0 offset=-1

2016/02/19 09:48:13 topic=logs partition=00 error=0 offset=-1
2016/02/19 09:48:13 topic=logs partition=01 error=0 offset=206686020
2016/02/19 09:48:13 topic=logs partition=02 error=0 offset=206583861
2016/02/19 09:48:13 topic=logs partition=03 error=0 offset=205800480
2016/02/19 09:48:13 topic=logs partition=04 error=0 offset=206542733
2016/02/19 09:48:13 topic=logs partition=05 error=0 offset=206728251
2016/02/19 09:48:13 topic=logs partition=06 error=0 offset=206534794
2016/02/19 09:48:13 topic=logs partition=07 error=0 offset=206643853

2016/02/19 09:48:14 topic=logs partition=00 error=0 offset=206584533
2016/02/19 09:48:14 topic=logs partition=01 error=0 offset=206690275
2016/02/19 09:48:14 topic=logs partition=02 error=0 offset=206588902
2016/02/19 09:48:14 topic=logs partition=03 error=0 offset=205805413
2016/02/19 09:48:14 topic=logs partition=04 error=0 offset=206542733
2016/02/19 09:48:14 topic=logs partition=05 error=0 offset=206733144
2016/02/19 09:48:14 topic=logs partition=06 error=0 offset=206540275
2016/02/19 09:48:14 topic=logs partition=07 error=0 offset=206649392
...
{code}

As you can see, the returned error code is 0 and there is no obvious reason why 
the returned offsets are suddenly wrong/blank. 

I have also added some debugging to our offset committer to make absolutely 
sure the numbers we are sending are absolutely correct and they are. 

Any help is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to