Todd Palino created KAFKA-1987:
----------------------------------

             Summary: Potential race condition in partition creation
                 Key: KAFKA-1987
                 URL: https://issues.apache.org/jira/browse/KAFKA-1987
             Project: Kafka
          Issue Type: Bug
          Components: controller
    Affects Versions: 0.8.1.1
            Reporter: Todd Palino
            Assignee: Neha Narkhede


I am finding that there appears to be a race condition when creating 
partitions, with replication factor 2 or higher, between the creation of the 
partition on the leader and the follower. What appears to be happening is that 
the follower is processing the command to create the partition before the 
leader does, and when the follower starts the replica fetcher, it fails with an 
UnknownTopicOrPartitionException.

The situation is that I am creating a large number of partitions on a cluster, 
preparing it for data being mirrored from another cluster. So there are a 
sizeable number of create and alter commands being sent sequentially. 
Eventually, the replica fetchers start up properly. But it seems like the 
controller should issue the command to create the partition to the leader, wait 
for confirmation, and then issue the command to create the partition to the 
followers.

2015/02/26 21:11:50.413 INFO [LogManager] [kafka-request-handler-12] 
[kafka-server] [] Created log for partition [topicA,30] in /path_to/i001_caches 
with properties {segment.index.bytes -> 10485760, file.delete.delay.ms -> 
60000, segment.bytes -> 268435456, flush.ms -> 10000, delete.retention.ms -> 
86400000, index.interval.bytes -> 4096, retention.bytes -> -1, 
min.insync.replicas -> 1, cleanup.policy -> delete, 
unclean.leader.election.enable -> true, segment.ms -> 43200000, 
max.message.bytes -> 1000000, flush.messages -> 20000, 
min.cleanable.dirty.ratio -> 0.5, retention.ms -> 86400000, segment.jitter.ms 
-> 0}.
2015/02/26 21:11:50.418 WARN [Partition] [kafka-request-handler-12] 
[kafka-server] [] Partition [topicA,30] on broker 1551: No checkpointed 
highwatermark is found for partition [topicA,30]
2015/02/26 21:11:50.418 INFO [ReplicaFetcherManager] [kafka-request-handler-12] 
[kafka-server] [] [ReplicaFetcherManager on broker 1551] Removed fetcher for 
partitions [topicA,30]
2015/02/26 21:11:50.418 INFO [Log] [kafka-request-handler-12] [kafka-server] [] 
Truncating log topicA-30 to offset 0.
2015/02/26 21:11:50.450 INFO [ReplicaFetcherManager] [kafka-request-handler-12] 
[kafka-server] [] [ReplicaFetcherManager on broker 1551] Added fetcher for 
partitions List([[topicA,30], initOffset 0 to broker 
id:1555,host:host1555.example.com,port:10251] )
2015/02/26 21:11:50.615 ERROR [ReplicaFetcherThread] 
[ReplicaFetcherThread-0-1555] [kafka-server] [] [ReplicaFetcherThread-0-1555], 
Error for partition [topicA,30] to broker 1555:class 
kafka.common.UnknownTopicOrPartitionException
2015/02/26 21:11:50.616 ERROR [ReplicaFetcherThread] 
[ReplicaFetcherThread-0-1555] [kafka-server] [] [ReplicaFetcherThread-0-1555], 
Error for partition [topicA,30] to broker 1555:class 
kafka.common.UnknownTopicOrPartitionException
2015/02/26 21:11:50.618 ERROR [ReplicaFetcherThread] 
[ReplicaFetcherThread-0-1555] [kafka-server] [] [ReplicaFetcherThread-0-1555], 
Error for partition [topicA,30] to broker 1555:class 
kafka.common.UnknownTopicOrPartitionException
2015/02/26 21:11:50.620 ERROR [ReplicaFetcherThread] 
[ReplicaFetcherThread-0-1555] [kafka-server] [] [ReplicaFetcherThread-0-1555], 
Error for partition [topicA,30] to broker 1555:class 
kafka.common.UnknownTopicOrPartitionException
2015/02/26 21:11:50.621 ERROR [ReplicaFetcherThread] 
[ReplicaFetcherThread-0-1555] [kafka-server] [] [ReplicaFetcherThread-0-1555], 
Error for partition [topicA,30] to broker 1555:class 
kafka.common.UnknownTopicOrPartitionException
2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to