I have same opinion. This is not correct for all users of kafka: messages that have already been consumed are typically no longer needed and can be safely deleted. Therefore, I propose enhancing the threshold strategy with an automatic deletion feature.
Currently, there is no configuration to delete only read records, and existing configurations work independently of whether they are read or not. But saying that a record that has been read is no longer read is not correct. It is possible that I provide 3 days of data to clients and a new client is added in the future and should be able to receive the last 3 days. It is not correct to say that the previous data should be deleted before the client is added because all of it has been read. The current KIP proposal is very good and reasonable. Because even sometimes we have other services on the disk servers that need to be sure that Kafka does not consider the entire disk for itself. On 2025/08/01 10:33:02 peng wrote: > In most use cases, Kafka serves as a messaging middleware where messages > that have already been consumed are typically no longer needed and can be > safely deleted. Therefore, I propose enhancing the threshold strategy with > an automatic deletion feature: > > When a broker's disk usage reaches 95%, it should automatically delete the > oldest 10% of messages on the node to free up disk space, allowing new > messages to be produced. This eliminates the need for manual cleanup while > ensuring that new messages (which are almost always more critical than > already-consumed data) take priority. > > Prevents disk-full scenarios by automatically removing stale data. > No admin intervention required for basic cleanup. > Fresh messages are never blocked by obsolete ones. > > The only potential risk arises if consumer groups experience significant > lag where unconsumed messages might be deleted prematurely. However, in > such cases, the root issue is the backlog itself―teams should prioritize > resolving the lag rather than relying on retention. > > > To accommodate different needs, we could introduce a > `disk.threshold.policy` parameter, allowing users to choose between: > 1. Rejecting new messages > 2. Auto deleting the oldest messages > > > Best regards > > mapan <[email protected]> 于 2025年7月31日周四 下午8:18写道: > > > Hi all, > > > > I’d like to start a discussion about a new KIP: > > https://cwiki.apache.org/confluence/x/Nw9JFg > > > > This KIP suggests adding disk threshold configs in Kafka and rejecting new > > product > > requests after reaching the threshold to prevent disk full failure. > > > > This strategy is similar to RocketMQ's diskMaxUsedSpaceRatio config or > > RabbitMQ's > > disk_free_limit config, and I hope to implement this strategy in our > > environment. > > > > Please share your feedback, questions, or concerns so we can refine > > the proposal together. > > > > Best regards, > > mapan > > >
