[
https://issues.apache.org/jira/browse/KAFKA-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925976#comment-17925976
]
Jorge Esteban Quilcate Otoya commented on KAFKA-18753:
------------------------------------------------------
The heap dump seems to show that this issue is happening while reading (not
uploading) segments from S3.
The problematic code seems to be around here:
[https://github.com/apache/kafka/blob/cf7029c0264fd7f7b15c2e98acc874ec8c3403f2/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1764-L1791]
where offset index is fetched and cached locally to then lookup positions, and
then aborted transactions are collected (which may read and cache a long
sequence of remote log indexes).
>From the heap dump a large number of indexes are mapped.
Increasing memory heap size to allocate more indexes, and increasing the remote
index cache size to reduce the rate of cache removal may help here. In general
tiered storage requires a bit more room (more so on the fetching side) to
operate efficiently.
Few questions:
Could you confirm there are consumers reading from beginning? (e.g. check
[RemoteFetchBytesPerSec|https://kafka.apache.org/documentation/#tiered_storage_monitoring]
metric)
Is the remote index cache size tuned or default (1GB)?
> Enabling S3 Tiered Storage Causes: A fatal error has been detected by the
> Java Runtime Environment
> --------------------------------------------------------------------------------------------------
>
> Key: KAFKA-18753
> URL: https://issues.apache.org/jira/browse/KAFKA-18753
> Project: Kafka
> Issue Type: Bug
> Components: Tiered-Storage
> Affects Versions: 3.8.1
> Environment: Current:
> Linux 6.8.0-1021-aws #23-Ubuntu SMP Mon Dec 9 23:59:34 UTC 2024 x86_64
> x86_64 x86_64 GNU/Linux
> OpenJDK Runtime Environment Corretto-17.0.14.7.1 (17.0.14+7) (build
> 17.0.14+7-LTS)
> Reporter: Hasil Sharma
> Priority: Major
> Attachments: hs_err_pid2775295 - redacted full.log
>
>
> Allowing brokers to upload to S3 as part of S3 Tiered Storage rollout
> (occasionally) results in errors shaped as below:
> {code:java}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x000075a38ea42564, pid=2775295, tid=2901446
> #
> # JRE version: OpenJDK Runtime Environment Corretto-17.0.14.7.1 (17.0.14+7)
> (build 17.0.14+7-LTS)
> # Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.14.7.1 (17.0.14+7-LTS,
> mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc,
> linux-amd64)
> # Problematic frame:
> # J 26432 c2
> org.apache.kafka.storage.internals.log.AbstractIndex.binarySearch(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;II)I
> (161 bytes) @ 0x000075a38ea42564 [0x000075a38ea421c0+0x00000000000003a4]
> #
> # Core dump will be written. Default location: Core dumps may be processed
> with "/usr/local/bin/crash-handler -b '%e' -m 1 -d /pay/crash -p '%u.%p.%t'
> -P '%P'" (or dumping to
> /pay/deploy/kafka-brokers-kafkapub-northwest-green/deploy-1737677684489251978/core.2775295)
> #
> # If you would like to submit a bug report, please visit:
> # https://github.com/corretto/corretto-17/issues/
> # {code}
>
> We ran into similar error with jdk11 and upgraded to jdk17, though the error
> has not stopped.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)