Masahiro Mori created KAFKA-19390:
-------------------------------------

             Summary: AbstractIndex#resize() does not release old mmap on Linux
                 Key: KAFKA-19390
                 URL: https://issues.apache.org/jira/browse/KAFKA-19390
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 3.8.1
            Reporter: Masahiro Mori


Our kafka broker crashed with the following error:
[2025-03-29 09:37:03,218] ERROR Error while appending records to 
<topic>-<partition> in dir /kafka-logs/data ...
java.io.IOException: Map failed
        at 
java.base/sun.nio.ch.FileChannelImpl.mapInternal(FileChannelImpl.java:1127)
        at java.base/sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:1032)
        at 
org.apache.kafka.storage.internals.log.AbstractIndex.createMappedBuffer(AbstractIndex.java:467)
        at 
org.apache.kafka.storage.internals.log.AbstractIndex.createAndAssignMmap(AbstractIndex.java:105)
        at 
org.apache.kafka.storage.internals.log.AbstractIndex.<init>(AbstractIndex.java:83)
        at 
org.apache.kafka.storage.internals.log.TimeIndex.<init>(TimeIndex.java:65)
        at 
org.apache.kafka.storage.internals.log.LazyIndex.loadIndex(LazyIndex.java:242)
        at 
org.apache.kafka.storage.internals.log.LazyIndex.get(LazyIndex.java:179)
        at 
org.apache.kafka.storage.internals.log.LogSegment.timeIndex(LogSegment.java:146)
        at 
org.apache.kafka.storage.internals.log.LogSegment.readMaxTimestampAndOffsetSoFar(LogSegment.java:201)
        at 
org.apache.kafka.storage.internals.log.LogSegment.maxTimestampSoFar(LogSegment.java:211)
        at 
org.apache.kafka.storage.internals.log.LogSegment.append(LogSegment.java:262)
        at kafka.log.LocalLog.append(LocalLog.scala:417)
        ...
Caused by: java.lang.OutOfMemoryError: Map failed
        at java.base/sun.nio.ch.FileChannelImpl.map0(Native Method)
        at 
java.base/sun.nio.ch.FileChannelImpl.mapInternal(FileChannelImpl.java:1124)
        ... 33 more
 
We discovered that kafka process hit the vm.max_map_count limit (which was set 
to 262144) and most of the mapped entries correspond to deleted index files.
> sudo cat /proc/${KAFKA_PID}/maps | grep deleted
7d8c5cc00000-7d8c5d600000 rw-s 00000000 08:11 202854769                  
/kafka-logs/data/topic1-22/00000000332910579773.timeindex.deleted (deleted)
7d8c5d800000-7d8c5e200000 rw-s 00000000 08:11 202854768                  
/kafka-logs/data/topic1-22/00000000332910579773.index.deleted (deleted)
7d8c67400000-7d8c67e00000 rw-s 00000000 08:11 202562514                  
/kafka-logs/data/topic2-116/00000000165968090794.timeindex.deleted (deleted)
7d8c68000000-7d8c68a00000 rw-s 00000000 08:11 202562513                  
/kafka-logs/data/topic2-116/00000000165968090794.index.deleted (deleted)
7d8c6d400000-7d8c6de00000 rw-s 00000000 08:11 202596518                  
/kafka-logs/data/topic2-356/00000000168702579081.timeindex.deleted (deleted)
7d8c6e000000-7d8c6ea00000 rw-s 00000000 08:11 202596517                  
/kafka-logs/data/topic2-356/00000000168702579081.index.deleted (deleted)
7d8c71c00000-7d8c72600000 rw-s 00000000 08:11 202798981                  
/kafka-logs/data/topic3-433/00000000116740630582.timeindex.deleted (deleted)
7d8c72800000-7d8c73200000 rw-s 00000000 08:11 202798980                  
/kafka-logs/data/topic3-433/00000000116740630582.index.deleted (deleted)
7d8c77c00000-7d8c78600000 rw-s 00000000 08:11 202754947                  
/kafka-logs/data/topic3-74/00000000118067749684.timeindex.deleted (deleted)
7d8c78800000-7d8c79200000 rw-s 00000000 08:11 202754946                  
/kafka-logs/data/topic3-74/00000000118067749684.index.deleted (deleted)
7d8c79400000-7d8c79e00000 rw-s 00000000 08:11 202813710                  
/kafka-logs/data/topic2-82/00000000162756700035.timeindex.deleted (deleted)
7d8c7a000000-7d8c7aa00000 rw-s 00000000 08:11 202813709                  
/kafka-logs/data/topic2-82/00000000162756700035.index.deleted (deleted)
7d8c7ac00000-7d8c7b600000 rw-s 00000000 08:11 202596526                  
/kafka-logs/data/topic2-355/00000000169939763750.timeindex.deleted (deleted)
7d8c7b800000-7d8c7c200000 rw-s 00000000 08:11 202596525                  
/kafka-logs/data/topic2-355/00000000169939763750.index.deleted (deleted)
7d8c7c400000-7d8c7ce00000 rw-s 00000000 08:11 202562498                  
/kafka-logs/data/topic2-295/00000000168913981903.timeindex.deleted (deleted)
7d8c7d000000-7d8c7da00000 rw-s 00000000 08:11 202562497                  
/kafka-logs/data/topic2-295/00000000168913981903.index.deleted (deleted)
7d8c80c00000-7d8c81600000 rw-s 00000000 08:11 202754939                  
/kafka-logs/data/topic3-13/00000000115588098896.timeindex.deleted (deleted)
7d8c81800000-7d8c82200000 rw-s 00000000 08:11 202754938                  
/kafka-logs/data/topic3-13/00000000115588098896.index.deleted (deleted)
7d8c83c00000-7d8c84600000 rw-s 00000000 08:11 202798989                  
/kafka-logs/data/topic3-314/00000000118254254601.timeindex.deleted (deleted)
7d8c84800000-7d8c85200000 rw-s 00000000 08:11 202798988                  
/kafka-logs/data/topic3-314/00000000118254254601.index.deleted (deleted)
...
 
In AbstractIndex.resize(), the old memory mapping is explicitly unmapped on 
windows or z/OS using safeForceUnmap(), but on Linux the unmapping step is 
skipped.
The same issue was originally reported in KAFKA-7442, but the corresponding 
pull request was never merged.
We propose that resize() should call safeForceUnmap() on all platforms to 
prevent stale mappings from lingering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to