[ https://issues.apache.org/jira/browse/KAFKA-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839036#comment-16839036 ]
David commented on KAFKA-8103: ------------------------------ [~ijuma] We have gotten a few other cases over the last month that have different top level errors on this same cluster. Do you think they could be from the same underlying issue? {code:java} # Problematic frame: # J 16372 C2 org.apache.kafka.common.network.Selector.pollSelectionKeys(Ljava/util/Set;ZJ)V (543 bytes) @ 0x00007fc0233ebe0c [0x00007fc0233eb8c0+0x54c] #Register to memory mapping: RAX=0x0000000000000001 is an unknown value RBX=0x00007fbe5fb72880 is pointing into the stack for thread: 0x00007fc0315a8800 RCX=0x0000000000000040 is an unknown value RDX=0x0000000000001762 is an unknown value RSP=0x00007fbe5fb728a0 is pointing into the stack for thread: 0x00007fc0315a8800 RBP=0x00000000eafa9b17 is an unknown value RSI=0x000000054f001f38 is an oop sun.nio.ch.EPollArrayWrapper - klass: 'sun/nio/ch/EPollArrayWrapper' RDI=0x00000000a9e003e7 is an unknown value R8 =0x0000000000000000 is an unknown value R9 =0x0000000000010000 is an unknown value R10=0x0000000000000000 is an unknown value R11=0x00000000aa418a8b is an unknown value R12=0x0000000000000000 is an unknown value R13=0x00000005520c5458 is an oop sun.nio.ch.SelectionKeyImpl - klass: 'sun/nio/ch/SelectionKeyImpl' R14=0x000000055063faa8 is an oop java.lang.Object - klass: 'java/lang/Object' R15=0x00007fc0315a8800 is a thread{code} {code:java} # J 1826 C2 java.nio.Buffer.limit(I)Ljava/nio/Buffer; (62 bytes) @ 0x00007fa0216f52c0 [0x00007fa0216f52a0+0x20] Register to memory mapping: RAX=0x000000073f09e6a0 is an oop java.nio.HeapByteBuffer - klass: 'java/nio/HeapByteBuffer' RBX=0x000000073f09e6a0 is an oop java.nio.HeapByteBuffer - klass: 'java/nio/HeapByteBuffer' RCX=0x000000000000a6cf is an unknown value RDX=0x000000000000a6cf is an unknown value RSP=0x00007f9d6f137748 is pointing into the stack for thread: 0x00007fa031485800 RBP=0x000000054a3a69e0 is an oop org.apache.kafka.common.network.PlaintextTransportLayer - klass: 'org/apache/kafka/common/network/PlaintextTransportLayer' RSI=0x000000073f09e6a0 is an oop java.nio.HeapByteBuffer - klass: 'java/nio/HeapByteBuffer' RDI=0x00007f9fb13f84f3 is an unknown value R8 =0x000000074041c2c8 is an oop java.nio.HeapByteBuffer - klass: 'java/nio/HeapByteBuffer' R9 =0x00000000e808385f is an unknown value R10=0x000000000000a6cf is an unknown value R11=0x000000073f09e6a0 is an oop java.nio.HeapByteBuffer - klass: 'java/nio/HeapByteBuffer' R12=0x0000000000000000 is an unknown value R13=0x00007f9fada00000 is an unknown value R14=0x00000000e80852ac is an unknown value R15=0x00007fa031485800 is a thread {code} {code:java} # Problematic frame: # J 10102 C2 sun.nio.ch.FileChannelImpl.size()J (239 bytes) @ 0x00007fdc9aa2aa40 [0x00007fdc9aa2aa20+0x20] Register to memory mapping: RAX=0x0000000000000026 is an unknown value RBX=0x0000000000000163 is an unknown value RCX=0x0000000000000163 is an unknown value RDX=0x0000000549a29d38 is an oop sun.nio.ch.Util$1 - klass: 'sun/nio/ch/Util$1' RSP=0x00007fdb18cc8848 is pointing into the stack for thread: 0x00007fdca943e000 RBP=0x0000000000000062 is an unknown value RSI=0x0000000594f7e9b8 is an oop sun.nio.ch.FileChannelImpl - klass: 'sun/nio/ch/FileChannelImpl' RDI=0x00007fdc5a042988 is an unknown value R8 =0x0000000000000000 is an unknown value R9 =0x00000000f805747d is an unknown value R10=0x00000007190b80c0 is an oop org.apache.kafka.common.record.FileRecords - klass: 'org/apache/kafka/common/record/FileRecords' R11=0x00000000b29efd37 is an unknown value R12=0x0000000000000000 is an unknown value R13=0x00000007190c0800 is an oop java.util.ArrayDeque - klass: 'java/util/ArrayDeque' R14=0x0000000000000061 is an unknown value R15=0x00007fdca943e000 is a thread {code} > Kafka SIGSEGV on kafka-network-thread > ------------------------------------- > > Key: KAFKA-8103 > URL: https://issues.apache.org/jira/browse/KAFKA-8103 > Project: Kafka > Issue Type: Bug > Affects Versions: 1.1.1 > Environment: OS > Amazon Linux > Kernel > 4.14.97-74.72.amzn1.x86_64 #1 SMP Tue Feb 5 20:59:30 UTC 2019 x86_64 x86_64 > x86_64 GNU/Linux > Java > openjdk version "1.8.0_191" > OpenJDK Runtime Environment (build 1.8.0_191-b12) > OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) > AWS Instance Type > c5.4xlarge > Reporter: Sean Humbarger > Priority: Major > Attachments: hs_err_pid4345.log > > > We have a 4 node cluster (6 topics, 6 consumer groups) that is processing > 65,000 messages per second and are seeing SIGSEGV crashes at least once a day > (see attachment). Each broker has six disks attached to it to support the > kafka logs. When the crash occurs, we simply restart kafka and everything > seems fine. We don't see anything out of the ordinary in /var/log/messages > or dmesg when the crashes occur. Thus far, we are unable to predict during > the day when the crash will occur or which node it will occur on. > > The problematic frame is as follows: > {code:java} > # Problematic frame: > # J 8628 C2 > org.apache.kafka.common.metrics.stats.Max.update(Lorg/apache/kafka/common/metrics/stats/SampledStat$Sample;Lorg/apache/kafka/common/metrics/MetricConfig;DJ)V > (13 bytes) @ 0x00007ff779f9fca0 [0x00007ff779f9fc80+0x20] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)