[jira] [Commented] (KAFKA-1509) Restart of destination broker after unreplicated partition move leaves partitions without leader

2014-07-18 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066567#comment-14066567
 ] 

Guozhang Wang commented on KAFKA-1509:
--

Yes this is still a valid issue, but could probably be a tricky issue also. I 
looked through the controller code, basically when a new broker startup the 
controller needs to try to use the offline-elector to elect the new leaders for 
those offline partitions which are hosted on the new broker to be also online 
partitions. But this process is somehow not executed, and instead the periodic 
preferred leader elector was executed later and failed the process since the 
new broker is not in the ISR yet.

This could be correlated to some bugs in delete-topic logic, but more 
investigation is needed to find the right fix for this issue.

> Restart of destination broker after unreplicated partition move leaves 
> partitions without leader
> 
>
> Key: KAFKA-1509
> URL: https://issues.apache.org/jira/browse/KAFKA-1509
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Albert Strasheim
>  Labels: newbie++
> Attachments: controller2.log
>
>
> This should be reasonably easy to reproduce.
> Make a Kafka cluster with a few machines.
> Create a topic with partitions on these machines. No replication.
> Bring up one more Kafka node.
> Move some or all of the partitions onto this new broker:
> kafka-reassign-partitions.sh --generate --zookeeper zk:2181 
> --topics-to-move-json-file move.json --broker-list 
> kafka-reassign-partitions.sh --zookeeper 36cfqd1.in.cfops.it:2181 
> --reassignment-json-file reassign.json --execute
> Wait until broker is the leader for all the partitions you moved.
> Send some data to the partitions. It all works.
> Shut down the broker that just received the data. Start it back up.
>  
> {code}
> Topic:testPartitionCount:2ReplicationFactor:1 Configs:
>   Topic: test Partition: 0Leader: -1  Replicas: 7 Isr: 
>   Topic: test Partition: 1Leader: -1  Replicas: 7 Isr: 
> {code}
> Leader for topic test never gets elected even though this node is the only 
> node that knows about the topic.
> Some logs:
> {code}
> Jun 26 23:18:07 localhost kafka: INFO [Socket Server on Broker 7], Started 
> (kafka.network.SocketServer)
> Jun 26 23:18:07 localhost kafka: INFO [Socket Server on Broker 7], Started 
> (kafka.network.SocketServer)
> Jun 26 23:18:07 localhost kafka: INFO [ControllerEpochListener on 7]: 
> Initialized controller epoch to 53 and zk version 52 
> (kafka.controller.ControllerEpochListener)
> Jun 26 23:18:07 localhost kafka: INFO Will not load MX4J, mx4j-tools.jar is 
> not in the classpath (kafka.utils.Mx4jLoader$)
> Jun 26 23:18:07 localhost kafka: INFO Will not load MX4J, mx4j-tools.jar is 
> not in the classpath (kafka.utils.Mx4jLoader$)
> Jun 26 23:18:07 localhost kafka: INFO [Controller 7]: Controller starting up 
> (kafka.controller.KafkaController)
> Jun 26 23:18:07 localhost kafka: INFO conflict in /controller data: 
> {"version":1,"brokerid":7,"timestamp":"1403824687354"} stored data: 
> {"version":1,"brokerid":4,"timestamp":"1403297911725"} (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO conflict in /controller data: 
> {"version":1,"brokerid":7,"timestamp":"1403824687354"} stored data: 
> {"version":1,"brokerid":4,"timestamp":"1403297911725"} (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO [Controller 7]: Controller startup 
> complete (kafka.controller.KafkaController)
> Jun 26 23:18:07 localhost kafka: INFO Registered broker 7 at path 
> /brokers/ids/7 with address xxx:9092. (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO Registered broker 7 at path 
> /brokers/ids/7 with address xxx:9092. (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO [Kafka Server 7], started 
> (kafka.server.KafkaServer)
> Jun 26 23:18:07 localhost kafka: INFO [Kafka Server 7], started 
> (kafka.server.KafkaServer)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leader:3,ISR:3,LeaderEpoch:14,ControllerEpoch:53),ReplicationFactor:1),AllReplicas:3)
>  for partition [requests,0] in response to UpdateMetadata request sent by 
> controller 4 epoch 53 with correlation id 70 (state.change.logger)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leader:1,ISR:1,LeaderEpoch:11,ControllerEpoch:53),ReplicationFactor:1),AllReplicas:1)
>  for partition [requests,13] in response to UpdateMetadata request sent by 
> controller 4 epoch 53 with correlation id 70 (state.change.logger)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leade

[jira] [Commented] (KAFKA-1509) Restart of destination broker after unreplicated partition move leaves partitions without leader

2014-07-18 Thread Albert Strasheim (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066411#comment-14066411
 ] 

Albert Strasheim commented on KAFKA-1509:
-

[~nmarasoiu] Yes, I think a fix is still needed.

> Restart of destination broker after unreplicated partition move leaves 
> partitions without leader
> 
>
> Key: KAFKA-1509
> URL: https://issues.apache.org/jira/browse/KAFKA-1509
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Albert Strasheim
>  Labels: newbie++
> Attachments: controller2.log
>
>
> This should be reasonably easy to reproduce.
> Make a Kafka cluster with a few machines.
> Create a topic with partitions on these machines. No replication.
> Bring up one more Kafka node.
> Move some or all of the partitions onto this new broker:
> kafka-reassign-partitions.sh --generate --zookeeper zk:2181 
> --topics-to-move-json-file move.json --broker-list 
> kafka-reassign-partitions.sh --zookeeper 36cfqd1.in.cfops.it:2181 
> --reassignment-json-file reassign.json --execute
> Wait until broker is the leader for all the partitions you moved.
> Send some data to the partitions. It all works.
> Shut down the broker that just received the data. Start it back up.
>  
> {code}
> Topic:testPartitionCount:2ReplicationFactor:1 Configs:
>   Topic: test Partition: 0Leader: -1  Replicas: 7 Isr: 
>   Topic: test Partition: 1Leader: -1  Replicas: 7 Isr: 
> {code}
> Leader for topic test never gets elected even though this node is the only 
> node that knows about the topic.
> Some logs:
> {code}
> Jun 26 23:18:07 localhost kafka: INFO [Socket Server on Broker 7], Started 
> (kafka.network.SocketServer)
> Jun 26 23:18:07 localhost kafka: INFO [Socket Server on Broker 7], Started 
> (kafka.network.SocketServer)
> Jun 26 23:18:07 localhost kafka: INFO [ControllerEpochListener on 7]: 
> Initialized controller epoch to 53 and zk version 52 
> (kafka.controller.ControllerEpochListener)
> Jun 26 23:18:07 localhost kafka: INFO Will not load MX4J, mx4j-tools.jar is 
> not in the classpath (kafka.utils.Mx4jLoader$)
> Jun 26 23:18:07 localhost kafka: INFO Will not load MX4J, mx4j-tools.jar is 
> not in the classpath (kafka.utils.Mx4jLoader$)
> Jun 26 23:18:07 localhost kafka: INFO [Controller 7]: Controller starting up 
> (kafka.controller.KafkaController)
> Jun 26 23:18:07 localhost kafka: INFO conflict in /controller data: 
> {"version":1,"brokerid":7,"timestamp":"1403824687354"} stored data: 
> {"version":1,"brokerid":4,"timestamp":"1403297911725"} (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO conflict in /controller data: 
> {"version":1,"brokerid":7,"timestamp":"1403824687354"} stored data: 
> {"version":1,"brokerid":4,"timestamp":"1403297911725"} (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO [Controller 7]: Controller startup 
> complete (kafka.controller.KafkaController)
> Jun 26 23:18:07 localhost kafka: INFO Registered broker 7 at path 
> /brokers/ids/7 with address xxx:9092. (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO Registered broker 7 at path 
> /brokers/ids/7 with address xxx:9092. (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO [Kafka Server 7], started 
> (kafka.server.KafkaServer)
> Jun 26 23:18:07 localhost kafka: INFO [Kafka Server 7], started 
> (kafka.server.KafkaServer)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leader:3,ISR:3,LeaderEpoch:14,ControllerEpoch:53),ReplicationFactor:1),AllReplicas:3)
>  for partition [requests,0] in response to UpdateMetadata request sent by 
> controller 4 epoch 53 with correlation id 70 (state.change.logger)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leader:1,ISR:1,LeaderEpoch:11,ControllerEpoch:53),ReplicationFactor:1),AllReplicas:1)
>  for partition [requests,13] in response to UpdateMetadata request sent by 
> controller 4 epoch 53 with correlation id 70 (state.change.logger)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leader:1,ISR:1,LeaderEpoch:4,ControllerEpoch:53),ReplicationFactor:2),AllReplicas:1,5)
>  for partition [requests_ipv6,5] in response to UpdateMetadata request sent 
> by controller 4 epoch 53 with correlation id 70 (state.change.logger)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leader:4,ISR:4,LeaderEpoch:13,ControllerEpoch:53),ReplicationFactor:2),AllReplicas:4,5)
>  for partition [requests_stored,7] in response to UpdateMetadata request sent 
> by controller 4 epoch 53 with correlation id 70 (state.change.logger)
> Jun 26 23:18:07 loca

[jira] [Commented] (KAFKA-1509) Restart of destination broker after unreplicated partition move leaves partitions without leader

2014-07-18 Thread Nicolae Marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066319#comment-14066319
 ] 

Nicolae Marasoiu commented on KAFKA-1509:
-

Is a fix still needed for this, do you know?

> Restart of destination broker after unreplicated partition move leaves 
> partitions without leader
> 
>
> Key: KAFKA-1509
> URL: https://issues.apache.org/jira/browse/KAFKA-1509
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Albert Strasheim
>  Labels: newbie++
> Attachments: controller2.log
>
>
> This should be reasonably easy to reproduce.
> Make a Kafka cluster with a few machines.
> Create a topic with partitions on these machines. No replication.
> Bring up one more Kafka node.
> Move some or all of the partitions onto this new broker:
> kafka-reassign-partitions.sh --generate --zookeeper zk:2181 
> --topics-to-move-json-file move.json --broker-list 
> kafka-reassign-partitions.sh --zookeeper 36cfqd1.in.cfops.it:2181 
> --reassignment-json-file reassign.json --execute
> Wait until broker is the leader for all the partitions you moved.
> Send some data to the partitions. It all works.
> Shut down the broker that just received the data. Start it back up.
>  
> {code}
> Topic:testPartitionCount:2ReplicationFactor:1 Configs:
>   Topic: test Partition: 0Leader: -1  Replicas: 7 Isr: 
>   Topic: test Partition: 1Leader: -1  Replicas: 7 Isr: 
> {code}
> Leader for topic test never gets elected even though this node is the only 
> node that knows about the topic.
> Some logs:
> {code}
> Jun 26 23:18:07 localhost kafka: INFO [Socket Server on Broker 7], Started 
> (kafka.network.SocketServer)
> Jun 26 23:18:07 localhost kafka: INFO [Socket Server on Broker 7], Started 
> (kafka.network.SocketServer)
> Jun 26 23:18:07 localhost kafka: INFO [ControllerEpochListener on 7]: 
> Initialized controller epoch to 53 and zk version 52 
> (kafka.controller.ControllerEpochListener)
> Jun 26 23:18:07 localhost kafka: INFO Will not load MX4J, mx4j-tools.jar is 
> not in the classpath (kafka.utils.Mx4jLoader$)
> Jun 26 23:18:07 localhost kafka: INFO Will not load MX4J, mx4j-tools.jar is 
> not in the classpath (kafka.utils.Mx4jLoader$)
> Jun 26 23:18:07 localhost kafka: INFO [Controller 7]: Controller starting up 
> (kafka.controller.KafkaController)
> Jun 26 23:18:07 localhost kafka: INFO conflict in /controller data: 
> {"version":1,"brokerid":7,"timestamp":"1403824687354"} stored data: 
> {"version":1,"brokerid":4,"timestamp":"1403297911725"} (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO conflict in /controller data: 
> {"version":1,"brokerid":7,"timestamp":"1403824687354"} stored data: 
> {"version":1,"brokerid":4,"timestamp":"1403297911725"} (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO [Controller 7]: Controller startup 
> complete (kafka.controller.KafkaController)
> Jun 26 23:18:07 localhost kafka: INFO Registered broker 7 at path 
> /brokers/ids/7 with address xxx:9092. (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO Registered broker 7 at path 
> /brokers/ids/7 with address xxx:9092. (kafka.utils.ZkUtils$)
> Jun 26 23:18:07 localhost kafka: INFO [Kafka Server 7], started 
> (kafka.server.KafkaServer)
> Jun 26 23:18:07 localhost kafka: INFO [Kafka Server 7], started 
> (kafka.server.KafkaServer)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leader:3,ISR:3,LeaderEpoch:14,ControllerEpoch:53),ReplicationFactor:1),AllReplicas:3)
>  for partition [requests,0] in response to UpdateMetadata request sent by 
> controller 4 epoch 53 with correlation id 70 (state.change.logger)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leader:1,ISR:1,LeaderEpoch:11,ControllerEpoch:53),ReplicationFactor:1),AllReplicas:1)
>  for partition [requests,13] in response to UpdateMetadata request sent by 
> controller 4 epoch 53 with correlation id 70 (state.change.logger)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leader:1,ISR:1,LeaderEpoch:4,ControllerEpoch:53),ReplicationFactor:2),AllReplicas:1,5)
>  for partition [requests_ipv6,5] in response to UpdateMetadata request sent 
> by controller 4 epoch 53 with correlation id 70 (state.change.logger)
> Jun 26 23:18:07 localhost kafka: TRACE Broker 7 cached leader info 
> (LeaderAndIsrInfo:(Leader:4,ISR:4,LeaderEpoch:13,ControllerEpoch:53),ReplicationFactor:2),AllReplicas:4,5)
>  for partition [requests_stored,7] in response to UpdateMetadata request sent 
> by controller 4 epoch 53 with correlation id 70 (state.change.logger)
> Jun 26 23:18:07 localhos