[jira] [Commented] (KAFKA-1987) Potential race condition in partition creation

2015-03-09 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353502#comment-14353502
 ] 

Joel Koshy commented on KAFKA-1987:
---

I actually think it would be worthwhile to improve the error logging. E.g., if 
it is a replica fetcher thread, then instead of showing an error, then provide 
a more meaningful info message: e.g., Could not fetch from partition [topicA, 
partition 30] as the leader may not have created the topic yet.. (or something 
clearer if possible)

 Potential race condition in partition creation
 --

 Key: KAFKA-1987
 URL: https://issues.apache.org/jira/browse/KAFKA-1987
 Project: Kafka
  Issue Type: Bug
  Components: controller
Reporter: Todd Palino

 I am finding that there appears to be a race condition when creating 
 partitions, with replication factor 2 or higher, between the creation of the 
 partition on the leader and the follower. What appears to be happening is 
 that the follower is processing the command to create the partition before 
 the leader does, and when the follower starts the replica fetcher, it fails 
 with an UnknownTopicOrPartitionException.
 The situation is that I am creating a large number of partitions on a 
 cluster, preparing it for data being mirrored from another cluster. So there 
 are a sizeable number of create and alter commands being sent sequentially. 
 Eventually, the replica fetchers start up properly. But it seems like the 
 controller should issue the command to create the partition to the leader, 
 wait for confirmation, and then issue the command to create the partition to 
 the followers.
 2015/02/26 21:11:50.413 INFO [LogManager] [kafka-request-handler-12] 
 [kafka-server] [] Created log for partition [topicA,30] in 
 /path_to/i001_caches with properties {segment.index.bytes - 10485760, 
 file.delete.delay.ms - 6, segment.bytes - 268435456, flush.ms - 1, 
 delete.retention.ms - 8640, index.interval.bytes - 4096, 
 retention.bytes - -1, min.insync.replicas - 1, cleanup.policy - delete, 
 unclean.leader.election.enable - true, segment.ms - 4320, 
 max.message.bytes - 100, flush.messages - 2, 
 min.cleanable.dirty.ratio - 0.5, retention.ms - 8640, segment.jitter.ms 
 - 0}.
 2015/02/26 21:11:50.418 WARN [Partition] [kafka-request-handler-12] 
 [kafka-server] [] Partition [topicA,30] on broker 1551: No checkpointed 
 highwatermark is found for partition [topicA,30]
 2015/02/26 21:11:50.418 INFO [ReplicaFetcherManager] 
 [kafka-request-handler-12] [kafka-server] [] [ReplicaFetcherManager on broker 
 1551] Removed fetcher for partitions [topicA,30]
 2015/02/26 21:11:50.418 INFO [Log] [kafka-request-handler-12] [kafka-server] 
 [] Truncating log topicA-30 to offset 0.
 2015/02/26 21:11:50.450 INFO [ReplicaFetcherManager] 
 [kafka-request-handler-12] [kafka-server] [] [ReplicaFetcherManager on broker 
 1551] Added fetcher for partitions List([[topicA,30], initOffset 0 to broker 
 id:1555,host:host1555.example.com,port:10251] )
 2015/02/26 21:11:50.615 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.616 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.618 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.620 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.621 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1987) Potential race condition in partition creation

2015-02-27 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341226#comment-14341226
 ] 

Neha Narkhede commented on KAFKA-1987:
--

+1

 Potential race condition in partition creation
 --

 Key: KAFKA-1987
 URL: https://issues.apache.org/jira/browse/KAFKA-1987
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 0.8.1.1
Reporter: Todd Palino
Assignee: Neha Narkhede

 I am finding that there appears to be a race condition when creating 
 partitions, with replication factor 2 or higher, between the creation of the 
 partition on the leader and the follower. What appears to be happening is 
 that the follower is processing the command to create the partition before 
 the leader does, and when the follower starts the replica fetcher, it fails 
 with an UnknownTopicOrPartitionException.
 The situation is that I am creating a large number of partitions on a 
 cluster, preparing it for data being mirrored from another cluster. So there 
 are a sizeable number of create and alter commands being sent sequentially. 
 Eventually, the replica fetchers start up properly. But it seems like the 
 controller should issue the command to create the partition to the leader, 
 wait for confirmation, and then issue the command to create the partition to 
 the followers.
 2015/02/26 21:11:50.413 INFO [LogManager] [kafka-request-handler-12] 
 [kafka-server] [] Created log for partition [topicA,30] in 
 /path_to/i001_caches with properties {segment.index.bytes - 10485760, 
 file.delete.delay.ms - 6, segment.bytes - 268435456, flush.ms - 1, 
 delete.retention.ms - 8640, index.interval.bytes - 4096, 
 retention.bytes - -1, min.insync.replicas - 1, cleanup.policy - delete, 
 unclean.leader.election.enable - true, segment.ms - 4320, 
 max.message.bytes - 100, flush.messages - 2, 
 min.cleanable.dirty.ratio - 0.5, retention.ms - 8640, segment.jitter.ms 
 - 0}.
 2015/02/26 21:11:50.418 WARN [Partition] [kafka-request-handler-12] 
 [kafka-server] [] Partition [topicA,30] on broker 1551: No checkpointed 
 highwatermark is found for partition [topicA,30]
 2015/02/26 21:11:50.418 INFO [ReplicaFetcherManager] 
 [kafka-request-handler-12] [kafka-server] [] [ReplicaFetcherManager on broker 
 1551] Removed fetcher for partitions [topicA,30]
 2015/02/26 21:11:50.418 INFO [Log] [kafka-request-handler-12] [kafka-server] 
 [] Truncating log topicA-30 to offset 0.
 2015/02/26 21:11:50.450 INFO [ReplicaFetcherManager] 
 [kafka-request-handler-12] [kafka-server] [] [ReplicaFetcherManager on broker 
 1551] Added fetcher for partitions List([[topicA,30], initOffset 0 to broker 
 id:1555,host:host1555.example.com,port:10251] )
 2015/02/26 21:11:50.615 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.616 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.618 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.620 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.621 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1987) Potential race condition in partition creation

2015-02-27 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341342#comment-14341342
 ] 

Jiangjie Qin commented on KAFKA-1987:
-

I agree with Joel that we probably can leave it as is. Imagine a cluster just 
comes up, it is still possible that a follower comes up before the leader is 
up. I remember we've handled this case.

 Potential race condition in partition creation
 --

 Key: KAFKA-1987
 URL: https://issues.apache.org/jira/browse/KAFKA-1987
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 0.8.1.1
Reporter: Todd Palino
Assignee: Neha Narkhede

 I am finding that there appears to be a race condition when creating 
 partitions, with replication factor 2 or higher, between the creation of the 
 partition on the leader and the follower. What appears to be happening is 
 that the follower is processing the command to create the partition before 
 the leader does, and when the follower starts the replica fetcher, it fails 
 with an UnknownTopicOrPartitionException.
 The situation is that I am creating a large number of partitions on a 
 cluster, preparing it for data being mirrored from another cluster. So there 
 are a sizeable number of create and alter commands being sent sequentially. 
 Eventually, the replica fetchers start up properly. But it seems like the 
 controller should issue the command to create the partition to the leader, 
 wait for confirmation, and then issue the command to create the partition to 
 the followers.
 2015/02/26 21:11:50.413 INFO [LogManager] [kafka-request-handler-12] 
 [kafka-server] [] Created log for partition [topicA,30] in 
 /path_to/i001_caches with properties {segment.index.bytes - 10485760, 
 file.delete.delay.ms - 6, segment.bytes - 268435456, flush.ms - 1, 
 delete.retention.ms - 8640, index.interval.bytes - 4096, 
 retention.bytes - -1, min.insync.replicas - 1, cleanup.policy - delete, 
 unclean.leader.election.enable - true, segment.ms - 4320, 
 max.message.bytes - 100, flush.messages - 2, 
 min.cleanable.dirty.ratio - 0.5, retention.ms - 8640, segment.jitter.ms 
 - 0}.
 2015/02/26 21:11:50.418 WARN [Partition] [kafka-request-handler-12] 
 [kafka-server] [] Partition [topicA,30] on broker 1551: No checkpointed 
 highwatermark is found for partition [topicA,30]
 2015/02/26 21:11:50.418 INFO [ReplicaFetcherManager] 
 [kafka-request-handler-12] [kafka-server] [] [ReplicaFetcherManager on broker 
 1551] Removed fetcher for partitions [topicA,30]
 2015/02/26 21:11:50.418 INFO [Log] [kafka-request-handler-12] [kafka-server] 
 [] Truncating log topicA-30 to offset 0.
 2015/02/26 21:11:50.450 INFO [ReplicaFetcherManager] 
 [kafka-request-handler-12] [kafka-server] [] [ReplicaFetcherManager on broker 
 1551] Added fetcher for partitions List([[topicA,30], initOffset 0 to broker 
 id:1555,host:host1555.example.com,port:10251] )
 2015/02/26 21:11:50.615 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.616 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.618 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.620 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.621 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1987) Potential race condition in partition creation

2015-02-26 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339225#comment-14339225
 ] 

Joel Koshy commented on KAFKA-1987:
---

Looking at the code I think this is possible when the follower receives the 
LeaderAndIsr request first, but is probably harmless since the replica fetcher 
will just wait for the leader to process its leader transition. We could have 
the controller wait until the leader responds before sending the leaderandisr 
to followers, but not sure if that is worth doing.

 Potential race condition in partition creation
 --

 Key: KAFKA-1987
 URL: https://issues.apache.org/jira/browse/KAFKA-1987
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 0.8.1.1
Reporter: Todd Palino
Assignee: Neha Narkhede

 I am finding that there appears to be a race condition when creating 
 partitions, with replication factor 2 or higher, between the creation of the 
 partition on the leader and the follower. What appears to be happening is 
 that the follower is processing the command to create the partition before 
 the leader does, and when the follower starts the replica fetcher, it fails 
 with an UnknownTopicOrPartitionException.
 The situation is that I am creating a large number of partitions on a 
 cluster, preparing it for data being mirrored from another cluster. So there 
 are a sizeable number of create and alter commands being sent sequentially. 
 Eventually, the replica fetchers start up properly. But it seems like the 
 controller should issue the command to create the partition to the leader, 
 wait for confirmation, and then issue the command to create the partition to 
 the followers.
 2015/02/26 21:11:50.413 INFO [LogManager] [kafka-request-handler-12] 
 [kafka-server] [] Created log for partition [topicA,30] in 
 /path_to/i001_caches with properties {segment.index.bytes - 10485760, 
 file.delete.delay.ms - 6, segment.bytes - 268435456, flush.ms - 1, 
 delete.retention.ms - 8640, index.interval.bytes - 4096, 
 retention.bytes - -1, min.insync.replicas - 1, cleanup.policy - delete, 
 unclean.leader.election.enable - true, segment.ms - 4320, 
 max.message.bytes - 100, flush.messages - 2, 
 min.cleanable.dirty.ratio - 0.5, retention.ms - 8640, segment.jitter.ms 
 - 0}.
 2015/02/26 21:11:50.418 WARN [Partition] [kafka-request-handler-12] 
 [kafka-server] [] Partition [topicA,30] on broker 1551: No checkpointed 
 highwatermark is found for partition [topicA,30]
 2015/02/26 21:11:50.418 INFO [ReplicaFetcherManager] 
 [kafka-request-handler-12] [kafka-server] [] [ReplicaFetcherManager on broker 
 1551] Removed fetcher for partitions [topicA,30]
 2015/02/26 21:11:50.418 INFO [Log] [kafka-request-handler-12] [kafka-server] 
 [] Truncating log topicA-30 to offset 0.
 2015/02/26 21:11:50.450 INFO [ReplicaFetcherManager] 
 [kafka-request-handler-12] [kafka-server] [] [ReplicaFetcherManager on broker 
 1551] Added fetcher for partitions List([[topicA,30], initOffset 0 to broker 
 id:1555,host:host1555.example.com,port:10251] )
 2015/02/26 21:11:50.615 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.616 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.618 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.620 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2015/02/26 21:11:50.621 ERROR [ReplicaFetcherThread] 
 [ReplicaFetcherThread-0-1555] [kafka-server] [] 
 [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker 
 1555:class kafka.common.UnknownTopicOrPartitionException
 2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)