Mikhail Fesenko created KAFKA-19603:
---------------------------------------
Summary: Change log.segment.bytes configuration type from int to
long to support segments larger than 2GB
Key: KAFKA-19603
URL: https://issues.apache.org/jira/browse/KAFKA-19603
Project: Kafka
Issue Type: Improvement
Components: core, log
Reporter: Mikhail Fesenko
h2. Description
h3. Summary
Change the data type of *{{log.segment.bytes}}* configuration from *{{int}}* to
*{{long}}* to allow segment sizes beyond the current 2GB limit imposed by the
integer maximum value.
h3. Current Limitation
The {{*log.segment.bytes*}} configuration currently uses an *{{int}}* data
type, which limits the maximum segment size to ~2GB (2,147,483,647 bytes). This
constraint becomes problematic for modern high-capacity storage deployments.
h3. Background: Kafka Log Segment Structure
Each Kafka topic partition consists of multiple log segments stored as separate
files on disk. For each segment, Kafka maintains three core files:
* {*}{{.log}} files{*}: Contain the actual message data
* {*}{{.index}} files{*}: Store mappings between message offsets and their
physical positions within the log file, allowing Kafka to quickly locate
messages by their offset without scanning the entire log file
* {*}{{.timeindex}} files{*}: Store mappings between message timestamps and
their corresponding offsets, enabling efficient time-based retrieval of messages
h3. Motivation
# {*}Modern Hardware Capabilities{*}: Current deployments often use
high-capacity storage (e.g., EPYC servers with 4×15TB drives) where 2GB
segments are inefficiently small
# {*}File Handle Optimization{*}: Large Kafka deployments with many topics can
have 50-100k open files across all segment types (.log, .index, .timeindex
files). Each segment requires open file handles, and larger segments would
reduce the total number of files and improve caching efficiency
# {*}Performance Benefits{*}: Fewer segment rotations in high-traffic
scenarios would reduce I/O overhead and improve overall performance. Sequential
disk operations are much faster than random access patterns
# {*}Storage Efficiency{*}: Reducing segment file proliferation improves
filesystem metadata performance and reduces inode usage on high-volume
deployments
# {*}Community Interest{*}: Similar requests have been raised in community
forums (see [Confluent forum
discussion|https://forum.confluent.io/t/what-happens-if-i-increase-log-segment-bytes/5845])
h3. Proposed Solution
Change *{{log.segment.bytes}}* from *{{int}}* to *{{long}}* data type, allowing
segment sizes of 3-4GB or larger to better align with modern storage
capabilities.
h3. Technical Considerations (Raised by Community)
Based on dev mailing list discussion:
# {*}Index File Format Limitation{*}: Current index files use 4 bytes to
represent file positions within segments, assuming 2GB cap (Jun Rao). This
means:
** {{.index}} files store offset-to-position mappings using 4-byte integers
for file positions
** If segments exceed 2GB, position values would overflow the 4-byte limit
** Index format may need to be updated to support 8-byte positions
# {*}RemoteLogSegmentMetadata Interface{*}: Public interface currently uses
{{int}} for {{segmentSizeInBytes}} and may need updates (Jun Rao)
# {*}Segment File Ecosystem Impact{*}: Need to evaluate impact on all three
file types (.log, .index, .timeindex) and their interdependencies
# {*}Impact Assessment{*}: Need to evaluate all components that assume 2GB
segment limit
h3. Questions for Discussion
# What would be a reasonable maximum segment size limit?
# Should this change be backward compatible or require a protocol/format
version bump?
# Are there any other components beyond index files and
RemoteLogSegmentMetadata that need updates?
h3. Expected Benefits
* Reduced number of segment files for high-volume topics
* Improved file handle utilization and caching efficiency
* Better alignment with modern storage hardware capabilities
* Reduced segment rotation overhead in high-traffic scenarios
h3. Acceptance Criteria
* {{log.segment.bytes}} accepts long values > 2GB
* Index file format supports larger segments (if needed)
* RemoteLogSegmentMetadata interface updated (if needed)
* Backward compatibility maintained
* Documentation updated
* Unit and integration tests added
*Disclaimer*
I'm relatively new to Kafka internals and the JIRA contribution process. The
original idea and motivation came from my experience with large-scale
deployments, but I used Claude AI to help make this ticket more detailed and
technically structured. There may be technical inaccuracies or missing
implementation details that I haven't considered.
This ticket is open for community discussion and feedback before
implementation. Expert review and guidance would be greatly appreciated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)