Java / OS info:
--
java.specification.version = 1.8
java.vendor = Oracle Corporation
java.version = 1.8.0_45
Oracle Linux Server release 6.7
kernel version 2.6.32-573.18.1.el6.x86_64
Redacted LSOF
-
~46K Close Waits
--
java4692 kafka 2618u IPv6 264581081 0t0 TCP
XX--kafka01:XmlIpcRegSvc->XX--host1:33089 (CLOSE_WAIT)
java4692 kafka 2619u IPv6 264581082 0t0 TCP
XX--kafka01:XmlIpcRegSvc->XX--host2:37371 (CLOSE_WAIT)
java4692 kafka 2621u IPv6 264600187 0t0 TCP
XX--kafka01:XmlIpcRegSvc->XX--host3:40788 (CLOSE_WAIT)
475 Established connections
java4692 kafka *427u IPv6 282382725 0t0 TCP
XX--kafka01:54099->XX--host1:eforward (ESTABLISHED)
java4692 kafka *639u IPv6 282426735 0t0 TCP
XX--kafka01:36157->XX--kafka01:59964 (ESTABLISHED)
java4692 kafka *860u IPv6 282480072 0t0 TCP
XX--kafka01:XmlIpcRegSvc->XX--host2:50547 (ESTABLISHED)
java4692 kafka *507u IPv6 282481853 0t0 TCP
XX--kafka01:XmlIpcRegSvc->XX--host3:45096 (ESTABLISHED)
~3K
java4692 kafka 2367u REG 253,3 104857335 141033710
/XXX/kafka/LOG/__consumer_offsets-10/35177234.log
~1.5K
java4692 kafka memREG 253,3 10485760 141297356
/XXX/kafka/LOG/TOPIC-1-9/00028243.index
~1.5K
java4692 kafka 818u REG 253,3 2548089 141297556
/XXX/kafka/LOG/TOPIC-1-2-76/00146894.log
java4692 kafka 819u REG 253,3 0 141165545
/XXX/kafka/LOG/TOPIC-2-2-11/.log
On Fri, Aug 26, 2016 at 6:37 AM, Jaikiran Pai
wrote:
> Which Java vendor and version are you using in runtime? Also what OS is
> this? Can you get the lsof output (on Linux) and paste the output of that
> to some place (like gist) to show us what descriptors are open etc...
>
> -Jaikiran
>
>
> On Friday 26 August 2016 02:49 AM, Bharath Srinivasan wrote:
>
>> Hello:
>>
>> We are running a data pipeline application stack using Kafka 0.8.2.2 in
>> production. We have been seeing intermittent CLOSE_WAIT on our kafka
>> brokers frequently and they fill up the file handles pretty quickly. By
>> the
>> time the open file count reaches around 40K, the node becomes unresponsive
>> and we see huge GC pauses. The only way out has been restart of the node.
>> When the nodes are working fine, the average open files in the nodes stay
>> around 6K during peak load and 3K at average.
>>
>> Configurations:
>> - 5 broker cluster (Single node spec: 24 core processors, 250 GB RAM,
>> 256GB
>> SSD)
>> - 20 topics and 1100 partitions across all topics
>> - Replication factor of 3
>> - Java based KafkaProducer and high level consumers
>> (ZookeeperConsumerConnector)
>> - GC params { -Xmx32G -Xms4G -server -XX:MetaspaceSize=96m -XX:+UseG1GC
>> -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
>> -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50
>> -XX:MaxMetaspaceFreeRatio=80 }
>>
>> Any pointers here? Appreciate your help.
>>
>> Thanks,
>> Bharath
>>
>>
>