[
https://issues.apache.org/jira/browse/KAFKA-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930807#comment-17930807
]
Hasil Sharma commented on KAFKA-18753:
--------------------------------------
> The remote index cache size is set to 512MB. We begun with default 1 GB
> though that resulted in too many files and kafka started to run out of file
> descriptors.
We increased the cache size to 1GB and that helps with reducing the frequency
though does not stop the error. Could there be a potential race condition
between remote index cache purge and attempt to read the index as part of ~some
command?
We re-ran the kafka process with additional -XX flags to identify the exact
line which resulted in the fatal error and found below -
{code:java}
J 6032 c2 java.nio.DirectByteBuffer.getInt(I)I [email protected] (28 bytes) @
0x00007927ad2f80f1 [0x00007927ad2f80a0+0x0000000000000051]
j
org.apache.kafka.storage.internals.log.OffsetIndex.relativeOffset(Ljava/nio/ByteBuffer;I)I+5
j
org.apache.kafka.storage.internals.log.OffsetIndex.parseEntry(Ljava/nio/ByteBuffer;I)Lorg/apache/kafka/storage/internals/log/OffsetPosition;+11
j
org.apache.kafka.storage.internals.log.OffsetIndex.parseEntry(Ljava/nio/ByteBuffer;I)Lorg/apache/kafka/storage/internals/log/IndexEntry;+3
j
org.apache.kafka.storage.internals.log.AbstractIndex.binarySearch(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;II)I+30
j
org.apache.kafka.storage.internals.log.AbstractIndex.indexSlotRangeFor(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;)I+126
j
org.apache.kafka.storage.internals.log.AbstractIndex.smallestUpperBoundSlotFor(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;)I+8
j
org.apache.kafka.storage.internals.log.OffsetIndex.lambda$fetchUpperBoundOffset$2(Lorg/apache/kafka/storage/internals/log/OffsetPosition;I)Ljava/util/Optional;+20
J 36910 c2
kafka.log.remote.RemoteLogManager.read(Lorg/apache/kafka/storage/internals/log/RemoteStorageFetchInfo;)Lorg/apache/kafka/storage/internals/log/FetchDataInfo;
(624 bytes) @ 0x00007927af7d1190 [0x00007927af7cff60+0x0000000000001230]
J 37034 c2 kafka.log.remote.RemoteLogReader.call()Ljava/lang/Void; (262 bytes)
@ 0x00007927af82a2e4 [0x00007927af82a1a0+0x0000000000000144]
J 27891% c2
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
[email protected] (187 bytes) @ 0x00007927ae93cf4c
[0x00007927ae93c740+0x000000000000080c]
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 [email protected]
j java.lang.Thread.run()V+11 [email protected]
v ~StubRoutines::call_stub
V [libjvm.so+0x85aed4] JavaCalls::call_helper(JavaValue*, methodHandle
const&, JavaCallArguments*, JavaThread*)+0x334
V [libjvm.so+0x85c9bc] JavaCalls::call_virtual(JavaValue*, Handle, Klass*,
Symbol*, Symbol*, JavaThread*)+0x20c
V [libjvm.so+0x91ce50] thread_entry(JavaThread*, JavaThread*)+0x70
V [libjvm.so+0xee3e37] JavaThread::run()+0x127
V [libjvm.so+0xee6f61] Thread::call_run()+0xa1
V [libjvm.so+0xc54d33] thread_native_entry(Thread*)+0xe3
C [libc.so.6+0x9caa4]
{code}
Attached the in-depth error log as part of hs_err_pid1507409-redacted.log file.
> Enabling S3 Tiered Storage Causes: A fatal error has been detected by the
> Java Runtime Environment
> --------------------------------------------------------------------------------------------------
>
> Key: KAFKA-18753
> URL: https://issues.apache.org/jira/browse/KAFKA-18753
> Project: Kafka
> Issue Type: Bug
> Components: Tiered-Storage
> Affects Versions: 3.8.1
> Environment: Current:
> Linux 6.8.0-1021-aws #23-Ubuntu SMP Mon Dec 9 23:59:34 UTC 2024 x86_64
> x86_64 x86_64 GNU/Linux
> OpenJDK Runtime Environment Corretto-17.0.14.7.1 (17.0.14+7) (build
> 17.0.14+7-LTS)
> Reporter: Hasil Sharma
> Priority: Major
> Attachments: hs_err_pid1507409-redacted.log, hs_err_pid2775295 -
> redacted full.log
>
>
> Allowing brokers to upload to S3 as part of S3 Tiered Storage rollout
> (occasionally) results in errors shaped as below:
> {code:java}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x000075a38ea42564, pid=2775295, tid=2901446
> #
> # JRE version: OpenJDK Runtime Environment Corretto-17.0.14.7.1 (17.0.14+7)
> (build 17.0.14+7-LTS)
> # Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.14.7.1 (17.0.14+7-LTS,
> mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc,
> linux-amd64)
> # Problematic frame:
> # J 26432 c2
> org.apache.kafka.storage.internals.log.AbstractIndex.binarySearch(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;II)I
> (161 bytes) @ 0x000075a38ea42564 [0x000075a38ea421c0+0x00000000000003a4]
> #
> # Core dump will be written. Default location: Core dumps may be processed
> with "/usr/local/bin/crash-handler -b '%e' -m 1 -d /pay/crash -p '%u.%p.%t'
> -P '%P'" (or dumping to
> /pay/deploy/kafka-brokers-kafkapub-northwest-green/deploy-1737677684489251978/core.2775295)
> #
> # If you would like to submit a bug report, please visit:
> # https://github.com/corretto/corretto-17/issues/
> # {code}
>
> We ran into similar error with jdk11 and upgraded to jdk17, though the error
> has not stopped.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)