[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2018-01-18 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331093#comment-16331093
 ] 

Jeff Widman commented on KAFKA-1120:


The issue description says "the broker will be in this weird state until it is 
restarted."

Couldn't this also be fixed by simply forcing a controller re-election? Since 
it will re-identiy the leaders?

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>Assignee: Mickael Maison
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-12-04 Thread Ramnatthan Alagappan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277309#comment-16277309
 ] 

Ramnatthan Alagappan commented on KAFKA-1120:
-

My way to constantly reproduce this issue might be not so correct or desired. 
However, here are the "steps". The key problem is with the broker registering 
with ZK before two events on the controller: 1. ZKClient re-registering 
callback for /brokers/ids (the callback is initially issued for the deletion of 
the broker), 2. ZKClient getting the data of /brokers/ids to check what has 
changed. This code is done in the fireChildChangedEvents function in the 
ZKClient library. To consistently trigger this issue, I patched ZkClient with a 
sleep inside this function: Thread.sleep(5000);  exists(path);  List 
children = getChildren(path). With this added delay, the deletion handler would 
sleep before re-registering the callback. If I restart the shutdown broker 
during this sleep, I see that the restarted broker would never appear in 
newBrokerIds. 

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>Assignee: Mickael Maison
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-12-04 Thread Mickael Maison (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277179#comment-16277179
 ] 

Mickael Maison commented on KAFKA-1120:
---

[~ramanala] Can you share your steps ?

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>Assignee: Mickael Maison
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-12-04 Thread Ramnatthan Alagappan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277146#comment-16277146
 ] 

Ramnatthan Alagappan commented on KAFKA-1120:
-

I ran into this issue and have a reproducible setup irrespective of the number 
of partitions or nodes. [~onurkaraman]'s analysis in comment @  
[#comment-16113645] is correct. The root cause is that the shutdown broker 
restarts and registers with ZK in a short interval of time. During this time, 
ZK delivers a callback for deletion of the broker. Before ZKClient can 
reestablish the callback (by issuing a stat call), the broker registers with 
ZK. By the time ZKClient gets the /brokers/ids node from ZK, the shutdown 
broker also appears in /brokers/ids. With this, the shutdown broker appears 
both in curBrokerIds and liveOrShuttingDownBrokerIds, causing newBrokerIds to 
be empty, which causes this problem. 

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>Assignee: Mickael Maison
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-11-28 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268731#comment-16268731
 ] 

Edoardo Comar commented on KAFKA-1120:
--

Interestingly using [~wushujames] script in [#comment-16110002] on a 
development laptop running trunk code :
* with the suggested 2x5000 partitions, 2x replicated - the cluster is 
unstable, after resting idle, in a steady state for some 5-10 minutes, one or 
two of the brokers get disconnected from zookeeper, will reconnect and start a 
bounce where one or the other get out of sync
* with lower number of partitions (eg 2500,3500) the above instability doesn't 
show but with either a controlled shudown with short timeout, or a ungraceful 
kill, followed by broker restart get the cluster back in sync without issues



> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>Assignee: Mickael Maison
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-11-27 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267123#comment-16267123
 ] 

Jun Rao commented on KAFKA-1120:


[~mimaison], thanks for helping out!

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>Assignee: Mickael Maison
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-11-27 Thread Mickael Maison (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267107#comment-16267107
 ] 

Mickael Maison commented on KAFKA-1120:
---

[~junrao] Thanks for the confirmation, in that case I'll grab it and start 
working on a fix !

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-11-27 Thread Mickael Maison (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266652#comment-16266652
 ] 

Mickael Maison commented on KAFKA-1120:
---

We've hit this issue ([KAFKA-3944]) a number of times in our 0.10.2 clusters, 
when a broker is restarted too fast, it can receive a bunch of StopReplica 
requests from the controller which is still processing the ControllerShutdown 
request. As [~wushujames] has mentioned, pre 0.11 apart from waiting "long 
enough" there's unfortunately not really a way to be sure the controller is 
done processing the ControlledShutdown request.

As far as I can tell this issue should still present in 1.0.0 (I haven't had a 
chance to try reproducing yet). If so, the solution suggested by [~onurkaraman] 
to use the broker's session id to prevent broker to process requests for the 
previous generation should work.


> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-25 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142409#comment-16142409
 ] 

James Cheng commented on KAFKA-1120:


I'd actually be more comfortable with simply delaying the startup of the 
broker, until I am certain that the controller is done processing the 
ControlledShutdown events. I can understand and explain that better, rather 
than understanding the intricacies of the controller.

However, there is the question of how do I actually detect "that the controller 
is done processing the shutdown event". KAFKA-5135 would actually help me 
figure out whether the controller is still processing the shutdown event, but 
we aren't on 0.11 yet. So maybe it wouldn't be that easy.

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-25 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142407#comment-16142407
 ] 

James Cheng commented on KAFKA-1120:


It's not so rare in a cloud environment, where nodes may fail. What will happen 
during that time when a broker goes down hard and during the 
controller.socket.timeout.ms timeframe? Will the controller keep attempting to 
do thing using that channel (send LeaderAndIsr requests, etc) and simply wait 
until they timeout?


> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-25 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142400#comment-16142400
 ] 

Jun Rao commented on KAFKA-1120:


Yes, that's the right metric. The only down side of increasing the timeout is 
the latency to detect TCP connection issue when a broker goes down hard (e.g. 
power outage) while the controller is communicating with the brokers. This 
should be rare though.

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-25 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142355#comment-16142355
 ] 

James Cheng commented on KAFKA-1120:


Do you mean this mbean?

kafka.network:type=RequestMetrics,name=TotalTimeMs,request=ControlledShutdown

[~junrao], I think you've mentioned before that controller.socket.timeout.ms 
applies to *all* broker-controller communication. So not just 
ControlledShutdown requests, but for LeaderAndIsr updates and stuff like that. 
I'm hesitant to touch that metric. Although, with high partition counts, would 
it be recommended? Most of those other requests are fairly quick requests, so I 
don't think they would benefit from increased socket timeouts. But then again, 
the increased timeout wouldn't hurt them either.

I think ControlledShutdown is one of the few synchronous operations, right?


> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-25 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142304#comment-16142304
 ] 

James Cheng commented on KAFKA-1120:


Onur, while you are figuring out a fix, can you recommend anything we can do to 
avoid triggering the scenario? We have high partition counts, and so are 
encountering long ControlledShutdown events, which means we are hitting this 
quite often.

It seems like the bug gets triggered when the broker gets back into zookeeper 
before the controlled shutdown is finished being processed. So, I either make 
ControlledShutdown get processed faster, or I make it so that the broker gets 
into zookeeper slower. Would that be right?

I can make ControlledShutdown get processed faster by reducing partition count, 
for example.
I can make the broker get into zookeeper slower either by making sure it takes 
longer to shutdown (increasing controlled.shutdown.retry.backoff.ms?) or 
delaying startup ("sleep 60 && ./bin/kafka-server-start.sh")

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-04 Thread Onur Karaman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114911#comment-16114911
 ] 

Onur Karaman commented on KAFKA-1120:
-

[~wushujames] I think Jun's comments and the redesign doc in KAFKA-5027 are 
sort of saying the same thing. The broker-generation concept has two use cases 
which was sort of implied:
1. the controller using broker generations to distinguish events from a broker 
across generations.
2. controller-to-broker requests should include broker generation so that 
brokers can ignore requests that applied to its former generation.

While I think czxid's will work for the 1st use case, I don't think we can 
naively reuse czxid for the 2nd use case. The reason is a bit silly: 
zookeeper's CreateResponse only provides the path. It doesn't provide the 
created znode's Stat, So you have to do a later lookup to find out the znode's 
czxid.

If we want to solve both use cases with the same approach, I think we have a 
couple of options:
1. maybe we can get away with using czxids by doing a multi-op when registering 
brokers to transactionally create a znode and read that same znode to read the 
czxid of the znode it just created.
2. we can instead use the session id as the broker generation. The controller 
can infer the broker's generation by observing the broker znode's 
ephemeralOwner property. Brokers can determine their generation id by looking 
up the underlying zookeeper client's session id which is just 
ZooKeeper.getSessionId(). The ephemeralOwner of an ephemeral znode its the 
client's session id which is why this would work.

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-04 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114678#comment-16114678
 ] 

James Cheng commented on KAFKA-1120:


[~onurkaraman], what do you think of Jun's suggestions in his [10/Jun/16 
16:28|#comment-15325496] and [15/Dec/16 02:10|#comment-15750982] comments above?

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-03 Thread Onur Karaman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113645#comment-16113645
 ] 

Onur Karaman commented on KAFKA-1120:
-

Alright I might know what's happening. Here's the red flag:
{code}
> grep -r "Newly added brokers" .
./kafka_2.11-0.11.0.0/logs/controller.log:[2017-08-03 13:40:09,121] INFO 
[Controller 1]: Newly added brokers: 1, deleted brokers: , all live brokers: 1 
(kafka.controller.KafkaController)
./kafka_2.11-0.11.0.0/logs/controller.log:[2017-08-03 13:40:27,172] INFO 
[Controller 1]: Newly added brokers: 2, deleted brokers: , all live brokers: 
1,2 (kafka.controller.KafkaController)
./kafka_2.11-0.11.0.0/logs/controller.log:[2017-08-03 13:47:15,215] INFO 
[Controller 1]: Newly added brokers: , deleted brokers: , all live brokers: 1,2 
(kafka.controller.KafkaController)
./kafka_2.11-0.11.0.0/logs/controller.log:[2017-08-03 13:47:17,927] INFO 
[Controller 1]: Newly added brokers: , deleted brokers: , all live brokers: 1,2 
(kafka.controller.KafkaController)
{code}

Here's the relevant code in BrokerChange.process:
{code}
val curBrokers = zkUtils.getAllBrokersInCluster().toSet
val curBrokerIds = curBrokers.map(_.id)
val liveOrShuttingDownBrokerIds = controllerContext.liveOrShuttingDownBrokerIds
val newBrokerIds = curBrokerIds -- liveOrShuttingDownBrokerIds
val deadBrokerIds = liveOrShuttingDownBrokerIds -- curBrokerIds
{code}

Basically the ControlledShutdown event took so long to process that the 
BrokerChange corresponding to the killed broker (3rd BrokerChange in the above 
snippet) and BrokerChange corresponding to the restarted broker (4th 
BrokerChange in the above snippet) are queued up waiting for 
ControlledShutdown's completion. By the time these BrokerChange events get 
processed, the restarted broker is already registered in zookeeper, causing the 
broker to appear in both controllerContext.liveOrShuttingDownBrokerIds and the 
brokers listed in zookeeper. This means the controller will not execute the 
onBrokerFailure in the 3rd BrokerChange and will also not execute onBrokerJoin 
in the 4th BrokerChange.

I'm not sure of the fix. Broker generations as defined in the redesign doc in 
KAFKA-5027 would work but I'm not sure if it's strictly required.

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-01 Thread Onur Karaman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110205#comment-16110205
 ] 

Onur Karaman commented on KAFKA-1120:
-

Thanks [~wushujames] that is perfect. I can reproduce the problem.

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-01 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110002#comment-16110002
 ] 

James Cheng commented on KAFKA-1120:


[~onurkaraman], try this? I made it as copy/pasteable as possible.

{code}
wget http://apache.ip-guide.com/kafka/0.11.0.0/kafka_2.11-0.11.0.0.tgz
tar xvfz kafka_2.11-0.11.0.0.tgz
cp -r kafka_2.11-0.11.0.0 kafka_2.11-0.11.0.0_2

KAFKA_DIR=kafka_2.11-0.11.0.0
( cd ${KAFKA_DIR};
  cp config/server.properties config/server1.properties;
  echo >> config/server1.properties;
  echo broker.id=1 >> config/server1.properties;
  echo delete.topic.enable=true >> config/server1.properties;
  echo listeners=PLAINTEXT://:9092 >> config/server1.properties;
  echo log.dirs=/tmp/kafka-logs1 >> config/server1.properties;
  echo log.index.size.max.bytes=10 >> config/server1.properties;
  echo controlled.shutdown.max.retries=1 >> config/server1.properties;
)


KAFKA_DIR=kafka_2.11-0.11.0.0_2
( cd ${KAFKA_DIR};
  cp config/server.properties config/server2.properties;
  echo >> config/server2.properties;
  echo broker.id=2 >> config/server2.properties;
  echo delete.topic.enable=true >> config/server2.properties;
  echo listeners=PLAINTEXT://:9093 >> config/server2.properties;
  echo log.dirs=/tmp/kafka-logs2 >> config/server2.properties;
  echo log.index.size.max.bytes=10 >> config/server2.properties;
  echo controlled.shutdown.max.retries=1 >> config/server2.properties;
)

# Start zookeeper and kafka brokers

# In terminal 1, zookeeper
cd kafka_2.11-0.11.0.0
./bin/zookeeper-server-start.sh config/zookeeper.properties

# In terminal 2, broker 1
cd kafka_2.11-0.11.0.0
./bin/kafka-server-start.sh config/server1.properties

# In terminal 3, broker 2
cd kafka_2.11-0.11.0.0_2
./bin/kafka-server-start.sh config/server2.properties


# Create topics

cd kafka_2.11-0.11.0.0
# create 5000 partitions, all leaders on broker 1
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic leader1  
--replica-assignment `echo -n 1:2; for i in \`seq 4999\`; do echo -n ,1:2; done`

# create 5000 partitions, all leaders on broker 2
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic leader2  
--replica-assignment `echo -n 2:1; for i in \`seq 4999\`; do echo -n ,2:1; done`

# Wait until topics are fully created

# Stop and immediately start a broker
1. Verify that broker1 is the controller
2. Ctrl-C on broker2. Broker 2 will attempt controlled shutdown and will give 
up after 30 seconds.
3. The instant it exits, restart broker2.
4. Wait until both brokers have settled down and controller has settled down. 
Notice that broker2 is missing from lots of ISRs.


{code}

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-08-01 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108543#comment-16108543
 ] 

Ismael Juma commented on KAFKA-1120:


I tentatively assigned this to 1.0.0 so that we investigate the root cause. cc 
[~onurkaraman]

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
> Fix For: 1.0.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-07-31 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107857#comment-16107857
 ] 

James Cheng commented on KAFKA-1120:


Hi,

I retested this will Kafka 0.11. The problem still exists.

I followed the steps from my  24/Feb/17 22:57 comment. I ran it maybe 10 times 
in a row. Every single time, the broker that I restarted came back up and did 
not take leadership for any partitions. In addition, it only became a follower 
for about half the partitions.

The fact that it became follower for half the partitions shows that the 
controller is at least aware that the broker exists (that is, the controller 
successfully saw the broker come back online.). But the controller didn't tell 
the broker to follow all the partitions that it should have.


> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-07-03 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072663#comment-16072663
 ] 

Andrew Olson commented on KAFKA-1120:
-

[~wushujames] Can you retest with Kafka 0.11 to see if KAFKA-1211 resolves this 
problem?

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>  Labels: reliability
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)