[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test

Joel Koshy (JIRA) Tue, 11 Dec 2012 15:43:22 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529489#comment-13529489
 ]


Joel Koshy commented on KAFKA-664:
----------------------------------

One problem with the updated patch is the following: multi-fetch requests could 
be satisfied (since minBytes is 1).
i.e., the request will be marked as satisfied and the expiration code path will 
not be executed. We should do the update
on expiration even if the request has been satisfied.

So an additional "catch-all" that may make life easier for us is to have a 
global threshold of the requestsFor map - i.e.,
if its size exceeds a threshold (which will be checked on both 
expiration/checkSatisfied) then trigger a full cleanup - i.e.,
iterate over all entries in watchersFor and remove those that are satisfied.
                
> Kafka server threads die due to OOME during long running test
> -------------------------------------------------------------
>
>                 Key: KAFKA-664
>                 URL: https://issues.apache.org/jira/browse/KAFKA-664
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Jay Kreps
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-664-draft-2.patch, kafka-664-draft.patch, Screen 
> Shot 2012-12-09 at 11.22.50 AM.png, Screen Shot 2012-12-09 at 11.23.09 
> AM.png, Screen Shot 2012-12-09 at 11.31.29 AM.png, thread-dump.log, 
> watchersForKey.png
>
>
> I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long 
> running producer process that sends data to 100s of partitions continuously 
> for ~15 hours. After ~4 hours of operation, few server threads (acceptor and 
> processor) exited due to OOME -
> [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 
> (kafka.network.BoundedByteBufferReceive)
> java.lang.OutOfMemoryError: Java heap space
> [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 
> 'kafka-acceptor': (kafka.utils.Utils$)
> java.lang.OutOfMemoryError: Java heap space
> [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 
> 'kafka-processor-9092-1': (kafka.utils.Utils$)
> java.lang.OutOfMemoryError: Java heap space
> [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, 
> session 0x13afd0753870103 has expired, closing socket connection 
> (org.apache.zookeeper.ClientCnxn)
> [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) 
> (org.I0Itec.zkclient.ZkClient)
> [2012-12-07 08:24:46,344] INFO Initiating client connection, 
> connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913
>  sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 
> (org.apache.zookeeper.ZooKeeper)
> [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 
> (kafka.network.BoundedByteBufferReceive)
> java.lang.OutOfMemoryError: Java heap space
> [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 
> 'kafka-request-handler-0': (kafka.utils.Utils$)
> java.lang.OutOfMemoryError: Java heap space
> [2012-12-07 08:25:08,739] INFO Opening socket connection to server 
> eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn)
> [2012-12-07 08:25:14,221] INFO Socket connection established to 
> eat1-app311.corp/172.20.72.75:12913, initiating session 
> (org.apache.zookeeper.ClientCnxn)
> [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from 
> server in 3722ms for sessionid 0x0, closing socket connection and attempting 
> reconnect (org.apache.zookeeper.ClientCnxn)
> [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$)
> java.lang.OutOfMemoryError: Java heap space
> [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 
> (kafka.network.BoundedByteBufferReceive)
> java.lang.OutOfMemoryError: Java heap space
> It seems like it runs out of memory while trying to read the producer 
> request, but its unclear so far. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test

Reply via email to