[ 
https://issues.apache.org/jira/browse/KAFKA-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Fung resolved KAFKA-522.
-----------------------------

    Resolution: Fixed

By increasing the max user processes to 4096, the error doesn't show up any 
more:

ulimit -u 4096

It seems to be related to system resource configuration and not a Kafka issue. 
So mark this FIXED.
                
> OutOfMemoryError in System Test
> -------------------------------
>
>                 Key: KAFKA-522
>                 URL: https://issues.apache.org/jira/browse/KAFKA-522
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: John Fung
>
> A. This is only reproducible in a distributed environment:
> 1. Modify system_test/cluster_config.json to have all broker entities running 
> in 3 different remote hosts
> 2. (Optional) Update system_test/system_test_runner.py to log messages at 
> DEBUG level by uncommenting the following line: 
> # namedLogger.setLevel(logging.DEBUG)
> 3. In <kafka_home>/system_test, run “python -B system_test_runner.py”
> B. In this specific test session, the error occurred in Broker-3:
> [2012-09-20 16:26:25,777] INFO Closing socket connection to /x.x.x.x. 
> (kafka.network.Processor)
> [2012-09-20 16:26:25,778] INFO 3 successfully elected as leader 
> (kafka.server.ZookeeperLeaderElector)
> [2012-09-20 16:26:25,779] INFO [Controller 3], Broker 3 starting become 
> controller state transition (kafka.controller.KafkaController)
> [2012-09-20 16:26:25,950] INFO [Controller-3-to-broker-2-send-thread], 
> Starting  (kafka.controller.RequestSendThread)
> [2012-09-20 16:26:25,950] ERROR Error handling event ZkEvent[Data of 
> /controller changed sent to 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener@75088a1b] 
> (org.I0Itec.zkclient.ZkEventThread)
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at 
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$startRequestSendThread(ControllerChannelManager.scala:97)
>         at 
> kafka.controller.ControllerChannelManager$$anonfun$startup$1.apply(ControllerChannelManager.scala:40)
>         at 
> kafka.controller.ControllerChannelManager$$anonfun$startup$1.apply(ControllerChannelManager.scala:40)
>         at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
>         at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:631)
>         at 
> scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
>         at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
>         at 
> kafka.controller.ControllerChannelManager.startup(ControllerChannelManager.scala:40)
>         at 
> kafka.controller.KafkaController.startChannelManager(KafkaController.scala:230)
>         at 
> kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:223)
>         at 
> kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:72)
>         at 
> kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:47)
>         at 
> kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:55)
>         at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:94)
>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549)
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> [2012-09-20 16:26:31,767] INFO [BrokerChangeListener on Controller 3]: Broker 
> change listener fired for path /brokers/ids with children 3,2,1 
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> [2012-09-20 16:26:31,775] INFO [BrokerChangeListener on Controller 3]: Newly 
> added brokers: 1, deleted brokers: , all brokers: 3,2,1 
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> [2012-09-20 16:26:31,777] ERROR [BrokerChangeListener on Controller 3]: Error 
> while handling broker changes 
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at 
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$startRequestSendThread(ControllerChannelManager.scala:97)
>         at 
> kafka.controller.ControllerChannelManager.addBroker(ControllerChannelManager.scala:61)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$liftedTree1$1$7.apply(ReplicaStateMachine.scala:212)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$liftedTree1$1$7.apply(ReplicaStateMachine.scala:212)
>         at scala.collection.immutable.Set$Set1.foreach(Set.scala:81)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.liftedTree1$1(ReplicaStateMachine.scala:212)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:203)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:199)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:199)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:199)
>         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> [2012-09-20 16:27:22,821] INFO [BrokerChangeListener on Controller 3]: Broker 
> change listener fired for path /brokers/ids with children 3,2 
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> [2012-09-20 16:27:22,823] INFO [BrokerChangeListener on Controller 3]: Newly 
> added brokers: , deleted brokers: 1, all brokers: 3,2 
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> [2012-09-20 16:27:22,825] INFO [Controller-3-to-broker-1-send-thread], 
> Shutting down (kafka.controller.RequestSendThread)
> [2012-09-20 16:27:23,279] INFO Unable to read additional data from server 
> sessionid 0x139e47e2eb00004, likely server has closed socket, closing socket 
> connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> [2012-09-20 16:27:23,379] INFO zookeeper state changed (Disconnected) 
> (org.I0Itec.zkclient.ZkClient)
> C. At around the same time, the following are the main test output:
> 2012-09-20 16:26:25,251 - INFO - 
> ======================================================
> 2012-09-20 16:26:25,252 - INFO - Iteration 2 of 2
> 2012-09-20 16:26:25,252 - INFO - 
> ======================================================
> 2012-09-20 16:26:25,252 - INFO - looking up leader... 
> (kafka_system_test_utils)
> 2012-09-20 16:26:25,252 - DEBUG - executing command [ssh host2 "grep -i -h 
> 'Completed the leader state transition'  
> /mnt/u001/kafka_08_replication_system_test/system_test/replication_testsuite/testcase_1/logs/broker-1/kafka_server_9091.log
>  |  sort | tail -1"] (kafka_system_test_utils)
> 2012-09-20 16:26:25,344 - DEBUG - found the log line : [2012-09-20 
> 16:25:45,195] INFO Replica Manager on Broker 1: Completed the leader state 
> transition for topic test_1 partition 0 (kafka.server.ReplicaManager) 
> (kafka_system_test_utils)
> 2012-09-20 16:26:25,344 - DEBUG - brokerid: [1] entity_id: [1] 
> (kafka_system_test_utils)
> 2012-09-20 16:26:25,345 - DEBUG - executing command [ssh host3 "grep -i -h 
> 'Completed the leader state transition'  
> /mnt/u001/kafka_08_replication_system_test/system_test/replication_testsuite/testcase_1/logs/broker-2/kafka_server_9092.log
>  |  sort | tail -1"] (kafka_system_test_utils)
> 2012-09-20 16:26:25,455 - DEBUG - executing command [ssh host3 "grep -i -h 
> 'Completed the leader state transition'  
> /mnt/u001/kafka_08_replication_system_test/system_test/replication_testsuite/testcase_1/logs/broker-3/kafka_server_9093.log
>  |  sort | tail -1"] (kafka_system_test_utils)
> 2012-09-20 16:26:25,540 - INFO - 
> ======================================================
> 2012-09-20 16:26:25,540 - INFO - validating leader election
> 2012-09-20 16:26:25,540 - INFO - 
> ======================================================
> 2012-09-20 16:26:25,540 - INFO - found leader in entity [1] with brokerid [1] 
> for partition [0] (kafka_system_test_utils)
> 2012-09-20 16:26:25,541 - INFO - 
> ======================================================
> 2012-09-20 16:26:25,541 - INFO - bounce_leader flag : true
> 2012-09-20 16:26:25,541 - INFO - 
> ======================================================
> 2012-09-20 16:26:25,541 - INFO - stopping leader in entity 1 with pid 32679 
> (kafka_system_test_utils)
> 2012-09-20 16:26:25,541 - DEBUG - executing command [ssh host2 'pid=32679; 
> prev_pid=""; echo $pid; while [[ "x$pid" != "x" ]]; do prev_pid=$pid;   for 
> child in $(ps -o pid,ppid ax | awk "{ if ( \$2 == $pid ) { print \$1 }}");    
>  do echo $child; pid=$child;   done;   if [ $prev_pid == $pid ]; then     
> break;   fi; done' 2> /dev/null (system_test_utils)
> 2012-09-20 16:26:25,669 - DEBUG - terminating process id: 32679 in host: 
> host2 (kafka_system_test_utils)
> 2012-09-20 16:26:25,670 - DEBUG - executing command [ssh host2 'kill -15 
> 32681'] (system_test_utils)
> 2012-09-20 16:26:25,673 - DEBUG - executing command [ssh host2 'kill -15 
> 32679'] (system_test_utils)
> 2012-09-20 16:26:25,675 - INFO - sleeping for 5s for leader re-election to 
> complete (kafka_system_test_utils)
> 2012-09-20 16:26:30,681 - INFO - looking up broker shutdown... 
> (kafka_system_test_utils)
> 2012-09-20 16:26:30,681 - DEBUG - executing command [ssh host2 "grep -i -h 
> 'shut down completed'  
> /mnt/u001/kafka_08_replication_system_test/system_test/replication_testsuite/testcase_1/logs/broker-1/kafka_server_9091.log
>  |  sort | tail -1"] (kafka_system_test_utils)
> 2012-09-20 16:26:30,781 - DEBUG - found the log line : [2012-09-20 
> 16:26:25,776] INFO [Kafka Server 1], shut down completed 
> (kafka.server.KafkaServer) (kafka_system_test_utils)
> 2012-09-20 16:26:30,781 - DEBUG - brokerid: [1] entity_id: [1] 
> (kafka_system_test_utils)
> 2012-09-20 16:26:30,781 - DEBUG - unix timestamp of shut down completed: 
> 1348158385.776000 (kafka_system_test_utils)
> 2012-09-20 16:26:30,781 - DEBUG - looking up new leader 
> (kafka_system_test_utils)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to