[jira] [Commented] (KAFKA-1310) Zookeeper timeout causes deadlock in Controller

2016-08-04 Thread .D. (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407483#comment-15407483
 ] 

.D. commented on KAFKA-1310:


Excuse me, appear this error is how to solve?
What principle, each too see?

> Zookeeper timeout causes deadlock in Controller
> ---
>
> Key: KAFKA-1310
> URL: https://issues.apache.org/jira/browse/KAFKA-1310
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Fedor Korotkiy
>Assignee: Neha Narkhede
>Priority: Blocker
> Fix For: 0.8.1.1
>
>
> Steps to reproduce:
> 1. Checkout and build 0.8.1 branch from github:
> git clone g...@github.com:apache/kafka.git && cd kafka && git checkout 
> origin/0.8.1 && ./gradlew jar
> 2. Start zookeeper server:
> ./bin/zookeeper-server-start.sh config/zookeeper.properties
> 3. Start kafka server:
> ./bin/kafka-server-start.sh config/server.properties
> 4. Suspend zookeeper process for 10 seconds (ctrl-Z, then %1).
> 5. And kafka hasn't been re-registered in zookeeper.
> ./bin/zookeeper-shell.sh
> ls /brokers/ids
> >> []
> Root cause of the problem seems to be the deadlock between DeleteTopicsThread 
> and SessionExpirationListener in KafkaController.
> 1. DeleteTopicsThread acquires controllerLock and await()-s on 
> deleteTopicsCond in awaitTopicDeletionNotification()
> 2. SessionExpirationListener fires. It acquires controllerLock and tries to 
> shutdown deleteTopicManager(in onControllerResignation()). This interrupts 
> DeleteTopicsThread.
> 3. DeleteTopicsThread can't return from deleteTopicsCond.await() because 
> controllerLock is taken. We got a deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1310) Zookeeper timeout causes deadlock in Controller

2014-04-09 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964521#comment-13964521
 ] 

Joel Koshy commented on KAFKA-1310:
---

Fixed by KAFKA-1317

 Zookeeper timeout causes deadlock in Controller
 ---

 Key: KAFKA-1310
 URL: https://issues.apache.org/jira/browse/KAFKA-1310
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Fedor Korotkiy
Assignee: Neha Narkhede
Priority: Blocker
 Fix For: 0.8.1.1


 Steps to reproduce:
 1. Checkout and build 0.8.1 branch from github:
 git clone g...@github.com:apache/kafka.git  cd kafka  git checkout 
 origin/0.8.1  ./gradlew jar
 2. Start zookeeper server:
 ./bin/zookeeper-server-start.sh config/zookeeper.properties
 3. Start kafka server:
 ./bin/kafka-server-start.sh config/server.properties
 4. Suspend zookeeper process for 10 seconds (ctrl-Z, then %1).
 5. And kafka hasn't been re-registered in zookeeper.
 ./bin/zookeeper-shell.sh
 ls /brokers/ids
  []
 Root cause of the problem seems to be the deadlock between DeleteTopicsThread 
 and SessionExpirationListener in KafkaController.
 1. DeleteTopicsThread acquires controllerLock and await()-s on 
 deleteTopicsCond in awaitTopicDeletionNotification()
 2. SessionExpirationListener fires. It acquires controllerLock and tries to 
 shutdown deleteTopicManager(in onControllerResignation()). This interrupts 
 DeleteTopicsThread.
 3. DeleteTopicsThread can't return from deleteTopicsCond.await() because 
 controllerLock is taken. We got a deadlock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1310) Zookeeper timeout causes deadlock in Controller

2014-03-28 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951474#comment-13951474
 ] 

Timothy Chen commented on KAFKA-1310:
-

I tried out the repro scenario described in the latest 0.8.1 branch, and with 
latest commit 39a5607 I see that after pausing zookeeper for 10 seconds the 
broker successfully registers itself afterwards.

 Zookeeper timeout causes deadlock in Controller
 ---

 Key: KAFKA-1310
 URL: https://issues.apache.org/jira/browse/KAFKA-1310
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Fedor Korotkiy
Assignee: Neha Narkhede
Priority: Blocker
 Fix For: 0.8.1.1


 Steps to reproduce:
 1. Checkout and build 0.8.1 branch from github:
 git clone g...@github.com:apache/kafka.git  cd kafka  git checkout 
 origin/0.8.1  ./gradlew jar
 2. Start zookeeper server:
 ./bin/zookeeper-server-start.sh config/zookeeper.properties
 3. Start kafka server:
 ./bin/kafka-server-start.sh config/server.properties
 4. Suspend zookeeper process for 10 seconds (ctrl-Z, then %1).
 5. And kafka hasn't been re-registered in zookeeper.
 ./bin/zookeeper-shell.sh
 ls /brokers/ids
  []
 Root cause of the problem seems to be the deadlock between DeleteTopicsThread 
 and SessionExpirationListener in KafkaController.
 1. DeleteTopicsThread acquires controllerLock and await()-s on 
 deleteTopicsCond in awaitTopicDeletionNotification()
 2. SessionExpirationListener fires. It acquires controllerLock and tries to 
 shutdown deleteTopicManager(in onControllerResignation()). This interrupts 
 DeleteTopicsThread.
 3. DeleteTopicsThread can't return from deleteTopicsCond.await() because 
 controllerLock is taken. We got a deadlock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1310) Zookeeper timeout causes deadlock in Controller

2014-03-28 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951678#comment-13951678
 ] 

Neha Narkhede commented on KAFKA-1310:
--

Very cool. Thanks for verifying that [~tnachen]!

 Zookeeper timeout causes deadlock in Controller
 ---

 Key: KAFKA-1310
 URL: https://issues.apache.org/jira/browse/KAFKA-1310
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Fedor Korotkiy
Assignee: Neha Narkhede
Priority: Blocker
 Fix For: 0.8.1.1


 Steps to reproduce:
 1. Checkout and build 0.8.1 branch from github:
 git clone g...@github.com:apache/kafka.git  cd kafka  git checkout 
 origin/0.8.1  ./gradlew jar
 2. Start zookeeper server:
 ./bin/zookeeper-server-start.sh config/zookeeper.properties
 3. Start kafka server:
 ./bin/kafka-server-start.sh config/server.properties
 4. Suspend zookeeper process for 10 seconds (ctrl-Z, then %1).
 5. And kafka hasn't been re-registered in zookeeper.
 ./bin/zookeeper-shell.sh
 ls /brokers/ids
  []
 Root cause of the problem seems to be the deadlock between DeleteTopicsThread 
 and SessionExpirationListener in KafkaController.
 1. DeleteTopicsThread acquires controllerLock and await()-s on 
 deleteTopicsCond in awaitTopicDeletionNotification()
 2. SessionExpirationListener fires. It acquires controllerLock and tries to 
 shutdown deleteTopicManager(in onControllerResignation()). This interrupts 
 DeleteTopicsThread.
 3. DeleteTopicsThread can't return from deleteTopicsCond.await() because 
 controllerLock is taken. We got a deadlock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1310) Zookeeper timeout causes deadlock in Controller

2014-03-21 Thread Michael Noll (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942939#comment-13942939
 ] 

Michael Noll commented on KAFKA-1310:
-

I can confirm this issue, using Kafka 0.8.1.

Here are the error messages when trying to create a topic:

{code}
$ bin/kafka-topics.sh --create --zookeeper zookeeper1:2181 --topic testing 
--partitions 1 --replication-factor 1
Error while executing topic command 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /brokers/ids
org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /brokers/ids
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
at kafka.utils.ZkUtils$.getChildren(ZkUtils.scala:480)
at kafka.utils.ZkUtils$.getSortedBrokerList(ZkUtils.scala:81)
at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:154)
at kafka.admin.TopicCommand$.createTopic(TopicCommand.scala:88)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:50)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /brokers/ids
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1277)
at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:99)
at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416)
at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
... 8 more
{code}

If you use the ZK CLI you will sometimes see a znode under {{/brokers/ids}}, 
sometimes not.  In my limited testing I could, for instance, create a topic 
(partitions=1, replicas=1) and then list/describe it.  But at least when I 
reached the point to try sending messages to it, it would fail.  See next 
example.

When trying to use the console producer to sent a test message foo (String) 
to the topic/broker:

{code}
$ bin/kafka-console-producer.sh --topic testing --broker-list localhost:9092
fooThis is the test message, manually entered in the console/terminal
[2014-03-20 09:45:32,223] WARN Error while fetching metadata [{TopicMetadata 
for topic testing -
No partition metadata for topic testing due to 
kafka.common.LeaderNotAvailableException}] for topic [testing]: class 
kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo)
[2014-03-20 09:45:32,233] WARN Error while fetching metadata [{TopicMetadata 
for topic testing -
No partition metadata for topic testing due to 
kafka.common.LeaderNotAvailableException}] for topic [testing]: class 
kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo)
[2014-03-20 09:45:32,234] ERROR Failed to collate messages by topic, partition 
due to: Failed to fetch topic metadata for topic: testing 
(kafka.producer.async.DefaultEventHandler)
{code}

*How to reproduce*

Using Wirbelsturm you can reproduce this error as follow.  This assumes you 
have Vagrant 1.4.x and VirtualBox already installed on your host machine.

{code}
$ git clone https://github.com/miguno/wirbelsturm.git
$ cd wirbelsturm
$ ./bootstrap #  May take a while depending on how fast your Internet 
connection is.

# Then uncomment the `kafka_broker` section in `wirbelsturm.yaml`.
# Only remove the leading `#` character in each line -- the remaining leading 
whitespace is significant.
$ vagrant up zookeeper1 kafka1 #  May take a while (boots VMs, downloads 
RPMs from the Internet to provision the VMs, etc.)
{code}

Now you can ssh into the VM {{kafka1}} via {{vagrant ssh kafka1}} and run the 
commands above from within the {{/opt/kafka}} directory.

 Zookeeper timeout causes deadlock in Controller
 ---

 Key: KAFKA-1310
 URL: https://issues.apache.org/jira/browse/KAFKA-1310
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Fedor Korotkiy
Assignee: Neha Narkhede
Priority: Blocker

 Steps to reproduce:
 1. Checkout and build 0.8.1 branch from github:
 git clone g...@github.com:apache/kafka.git  cd kafka  git checkout 
 origin/0.8.1  ./gradlew jar
 2. Start zookeeper server:
 ./bin/zookeeper-server-start.sh config/zookeeper.properties
 3. Start 

[jira] [Commented] (KAFKA-1310) Zookeeper timeout causes deadlock in Controller

2014-03-21 Thread Michael Noll (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942947#comment-13942947
 ] 

Michael Noll commented on KAFKA-1310:
-

Also, I can confirm the errors above do not occur with Kafka 0.8.0, using the 
following test commands:

{code}
$ bin/kafka-create-topic.sh --topic testing --zookeeper zookeeper1:2181 
--partition 1 --replica 1
creation succeeded!
$ bin/kafka-list-topic.sh --zookeeper zookeeper1:2181
topic: testing  partition: 0leader: 0   replicas: 0 isr: 0

# Trying to produce dat works!
$ bin/kafka-console-producer.sh --topic testing --broker-list localhost:9092
foo
^C

$ bin/kafka-console-consumer.sh --topic testing --zookeeper zookeeper1:2181 
--from-beginning
foo
{code}

 Zookeeper timeout causes deadlock in Controller
 ---

 Key: KAFKA-1310
 URL: https://issues.apache.org/jira/browse/KAFKA-1310
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Fedor Korotkiy
Assignee: Neha Narkhede
Priority: Blocker

 Steps to reproduce:
 1. Checkout and build 0.8.1 branch from github:
 git clone g...@github.com:apache/kafka.git  cd kafka  git checkout 
 origin/0.8.1  ./gradlew jar
 2. Start zookeeper server:
 ./bin/zookeeper-server-start.sh config/zookeeper.properties
 3. Start kafka server:
 ./bin/kafka-server-start.sh config/server.properties
 4. Suspend zookeeper process for 10 seconds (ctrl-Z, then %1).
 5. And kafka hasn't been re-registered in zookeeper.
 ./bin/zookeeper-shell.sh
 ls /brokers/ids
  []
 Root cause of the problem seems to be the deadlock between DeleteTopicsThread 
 and SessionExpirationListener in KafkaController.
 1. DeleteTopicsThread acquires controllerLock and await()-s on 
 deleteTopicsCond in awaitTopicDeletionNotification()
 2. SessionExpirationListener fires. It acquires controllerLock and tries to 
 shutdown deleteTopicManager(in onControllerResignation()). This interrupts 
 DeleteTopicsThread.
 3. DeleteTopicsThread can't return from deleteTopicsCond.await() because 
 controllerLock is taken. We got a deadlock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)