[jira] [Commented] (KAFKA-8103) Kafka SIGSEGV on kafka-network-thread
[ https://issues.apache.org/jira/browse/KAFKA-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803933#comment-16803933 ] Sean Humbarger commented on KAFKA-8103: --- We are still seeing random JVM crashes. We've switched over from OpenJDK to Oracle 1.8.202 and see the same thing: {code} # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f6c9cd85100, pid=4550, tid=0x7f6a64792700 # # JRE version: Java(TM) SE Runtime Environment (8.0_202-b08) (build 1.8.0_202-b08) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.202-b08 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x2c7100] Handle::Handle(Thread*, oopDesc*)+0x0 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x7f6c99279000): JavaThread "kafka-request-handler-3" daemon [_thread_in_vm, id=4984, stack(0x7f6a64692000,0x7f6a64793000)] siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x7f6c9cd85100 Registers: RAX=0x7f6c9da7b2ce, RBX=0x7f6c99279000, RCX=0x0005502d3250, RDX=0x0005502d3250 RSP=0x7f6a647913d8, RBP=0x7f6a64791410, RSI=0x7f6c99279000, RDI=0x7f6a647913e8 R8 =0xaa05a64a, R9 =0x000550a3d9b8, R10=0x7f6c9d488af0, R11=0x0002 R12=0x7f6a64791470, R13=0x0005502d3250, R14=0x7f6c9da85f8c, R15=0x7f6c99279000 RIP=0x7f6c9cd85100, EFLAGS=0x00010246, CSGSFS=0x002b0033, ERR=0x0015 TRAPNO=0x000e Top of Stack: (sp=0x7f6a647913d8) 0x7f6a647913d8: 7f6c9d488b38 00700070 0x7f6a647913e8: 7f6c8a4e702c 7f67d003adaa 0x7f6a647913f8: f2e95d57 0x7f6a64791408: a9e2f767 a9e05757 0x7f6a64791418: 7f6c88e78c88 a9e05757 0x7f6a64791428: 7f6c8a0d149c 0005aa05a64a 0x7f6a64791438: 0005502d3250 0007974aeab8 0x7f6a64791448: 00054f02bab8 00054f17bb60 0x7f6a64791458: 7f6c9dbbacdd 0007974adfe0 0x7f6a64791468: 7f6c9d3cc53f 0003 0x7f6a64791478: 0d70aa32 00054f037260 0x7f6a64791488: 7f6c8ab3cc88 aa147bbef4590578 0x7f6a64791498: 0007a2c82bc0 0007974ae068 0x7f6a647914a8: 0007974adcb0 0007974adfe0 0x7f6a647914b8: 000550a3ddf0 0007974adcf8 0x7f6a647914c8: 0007974adc48 0007a2c83420 0x7f6a647914d8: 7f6cf2e95b69 a9e269d8 0x7f6a647914e8: 7f6c8a2c0934 0007974adba0 0x7f6a647914f8: 7f6c8a000138 7f6a64791550 0x7f6a64791508: 7ffed59c2c60 7f6a64791580 0x7f6a64791518: 0002f4590578 00079d149f38 0x7f6a64791528: 000550a3ddf0 7f6a64791570 0x7f6a64791538: f4590684 0x7f6a64791548: 00079d1b3490 7f6a64791590 0x7f6a64791558: 7f6c9dbbacdd 0007f3a366fd 0x7f6a64791568: 7f6c9dbbacdd 0007974adb60 0x7f6a64791578: 7f6c9d3cc53f 00011172 0x7f6a64791588: 0d7091ab f4590567 0x7f6a64791598: 7f6c8a887a24 0007a2c82bc0 0x7f6a647915a8: 0007a2c832d0 aa147bbe15bd 0x7f6a647915b8: f2e95b56974adb48 0005f2e95b67 0x7f6a647915c8: 0007974adb38 0007974adab0 Instructions: (pc=0x7f6c9cd85100) 0x7f6c9cd850e0: e8 0b 4c 77 00 48 83 c4 30 5b 41 5c 5d c3 66 90 0x7f6c9cd850f0: f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0x7f6c9cd85100: 48 85 d2 74 63 55 48 89 e5 41 55 41 54 53 49 89 0x7f6c9cd85110: fc 48 89 d3 48 83 ec 08 4c 8b ae 38 01 00 00 49 Register to memory mapping: RAX=0x7f6c9da7b2ce: in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so at 0x7f6c9cabe000 RBX=0x7f6c99279000 is a thread RCX=0x0005502d3250 is an oop java.lang.Object - klass: 'java/lang/Object' RDX=0x0005502d3250 is an oop java.lang.Object - klass: 'java/lang/Object' RSP=0x7f6a647913d8 is pointing into the stack for thread: 0x7f6c99279000 RBP=0x7f6a64791410 is pointing into the stack for thread: 0x7f6c99279000 RSI=0x7f6c99279000 is a thread RDI=0x7f6a647913e8 is pointing into the stack for thread: 0x7f6c99279000 R8 =0xaa05a64a is an unknown value R9 =0x000550a3d9b8 is an oop org.apache.kafka.common.utils.KafkaThread - klass: 'org/apache/kafka/common/utils/KafkaThread' R10=0x7f6c9d488af0: in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so at 0x7f6c9cabe000 R11=0x0002 is an unknown value R12=0x7f6a64791470 is pointing into the s
[jira] [Updated] (KAFKA-8103) Kafka SIGSEGV on kafka-network-thread
[ https://issues.apache.org/jira/browse/KAFKA-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Humbarger updated KAFKA-8103: -- Description: We have a 4 node cluster (6 topics, 6 consumer groups) that is processing 65,000 messages per second and are seeing SIGSEGV crashes at least once a day (see attachment). Each broker has six disks attached to it to support the kafka logs. When the crash occurs, we simply restart kafka and everything seems fine. We don't see anything out of the ordinary in /var/log/messages or dmesg when the crashes occur. Thus far, we are unable to predict during the day when the crash will occur or which node it will occur on. The problematic frame is as follows: {code:java} # Problematic frame: # J 8628 C2 org.apache.kafka.common.metrics.stats.Max.update(Lorg/apache/kafka/common/metrics/stats/SampledStat$Sample;Lorg/apache/kafka/common/metrics/MetricConfig;DJ)V (13 bytes) @ 0x7ff779f9fca0 [0x7ff779f9fc80+0x20] {code} was: We have a 4 node cluster (6 topics, 6 consumer groups) that is processing 65,000 messages per second and are seeing SIGSEGV crashes at least once a day (see attachment). Each broker has six disks attached to it to support the kafka logs. When the crash occurs, we simply restart kafka and everything seems fine. We don't see any out of the ordinary in /var/log/messages or dmesg when the crashes occur. Thus far, we are unable to predict during the day when the crash will occur or which node it will occur on. The problematic frame is as follows: {code} # Problematic frame: # J 8628 C2 org.apache.kafka.common.metrics.stats.Max.update(Lorg/apache/kafka/common/metrics/stats/SampledStat$Sample;Lorg/apache/kafka/common/metrics/MetricConfig;DJ)V (13 bytes) @ 0x7ff779f9fca0 [0x7ff779f9fc80+0x20] {code} > Kafka SIGSEGV on kafka-network-thread > - > > Key: KAFKA-8103 > URL: https://issues.apache.org/jira/browse/KAFKA-8103 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.1.1 > Environment: OS > Amazon Linux > Kernel > 4.14.97-74.72.amzn1.x86_64 #1 SMP Tue Feb 5 20:59:30 UTC 2019 x86_64 x86_64 > x86_64 GNU/Linux > Java > openjdk version "1.8.0_191" > OpenJDK Runtime Environment (build 1.8.0_191-b12) > OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) > AWS Instance Type > c5.4xlarge >Reporter: Sean Humbarger >Priority: Major > Attachments: hs_err_pid4345.log > > > We have a 4 node cluster (6 topics, 6 consumer groups) that is processing > 65,000 messages per second and are seeing SIGSEGV crashes at least once a day > (see attachment). Each broker has six disks attached to it to support the > kafka logs. When the crash occurs, we simply restart kafka and everything > seems fine. We don't see anything out of the ordinary in /var/log/messages > or dmesg when the crashes occur. Thus far, we are unable to predict during > the day when the crash will occur or which node it will occur on. > > The problematic frame is as follows: > {code:java} > # Problematic frame: > # J 8628 C2 > org.apache.kafka.common.metrics.stats.Max.update(Lorg/apache/kafka/common/metrics/stats/SampledStat$Sample;Lorg/apache/kafka/common/metrics/MetricConfig;DJ)V > (13 bytes) @ 0x7ff779f9fca0 [0x7ff779f9fc80+0x20] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8103) Kafka SIGSEGV on kafka-network-thread
[ https://issues.apache.org/jira/browse/KAFKA-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Humbarger updated KAFKA-8103: -- Environment: OS Amazon Linux Kernel 4.14.97-74.72.amzn1.x86_64 #1 SMP Tue Feb 5 20:59:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Java openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) AWS Instance Type c5.4xlarge was: OS {code} Amazon Linux {code} Kernel {code} 4.14.97-74.72.amzn1.x86_64 #1 SMP Tue Feb 5 20:59:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux {code} Java {code} openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) {code} AWS Instance Type {code} c5.4xlarge {code} > Kafka SIGSEGV on kafka-network-thread > - > > Key: KAFKA-8103 > URL: https://issues.apache.org/jira/browse/KAFKA-8103 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.1.1 > Environment: OS > Amazon Linux > Kernel > 4.14.97-74.72.amzn1.x86_64 #1 SMP Tue Feb 5 20:59:30 UTC 2019 x86_64 x86_64 > x86_64 GNU/Linux > Java > openjdk version "1.8.0_191" > OpenJDK Runtime Environment (build 1.8.0_191-b12) > OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) > AWS Instance Type > c5.4xlarge >Reporter: Sean Humbarger >Priority: Major > Attachments: hs_err_pid4345.log > > > We have a 4 node cluster (6 topics, 6 consumer groups) that is processing > 65,000 messages per second and are seeing SIGSEGV crashes at least once a day > (see attachment). Each broker has six disks attached to it to support the > kafka logs. When the crash occurs, we simply restart kafka and everything > seems fine. We don't see any out of the ordinary in /var/log/messages or > dmesg when the crashes occur. Thus far, we are unable to predict during the > day when the crash will occur or which node it will occur on. > > The problematic frame is as follows: > {code} > # Problematic frame: > # J 8628 C2 > org.apache.kafka.common.metrics.stats.Max.update(Lorg/apache/kafka/common/metrics/stats/SampledStat$Sample;Lorg/apache/kafka/common/metrics/MetricConfig;DJ)V > (13 bytes) @ 0x7ff779f9fca0 [0x7ff779f9fc80+0x20] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8103) Kafka SIGSEGV on kafka-network-thread
Sean Humbarger created KAFKA-8103: - Summary: Kafka SIGSEGV on kafka-network-thread Key: KAFKA-8103 URL: https://issues.apache.org/jira/browse/KAFKA-8103 Project: Kafka Issue Type: Bug Affects Versions: 1.1.1 Environment: OS {code} Amazon Linux {code} Kernel {code} 4.14.97-74.72.amzn1.x86_64 #1 SMP Tue Feb 5 20:59:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux {code} Java {code} openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) {code} AWS Instance Type {code} c5.4xlarge {code} Reporter: Sean Humbarger Attachments: hs_err_pid4345.log We have a 4 node cluster (6 topics, 6 consumer groups) that is processing 65,000 messages per second and are seeing SIGSEGV crashes at least once a day (see attachment). Each broker has six disks attached to it to support the kafka logs. When the crash occurs, we simply restart kafka and everything seems fine. We don't see any out of the ordinary in /var/log/messages or dmesg when the crashes occur. Thus far, we are unable to predict during the day when the crash will occur or which node it will occur on. The problematic frame is as follows: {code} # Problematic frame: # J 8628 C2 org.apache.kafka.common.metrics.stats.Max.update(Lorg/apache/kafka/common/metrics/stats/SampledStat$Sample;Lorg/apache/kafka/common/metrics/MetricConfig;DJ)V (13 bytes) @ 0x7ff779f9fca0 [0x7ff779f9fc80+0x20] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)