Hello everyone. As of today, when a data disk fills up and Kafka is unable to write to it anymore, the Kafka process crashes. In this situation, users are unable to use Kafka APIs to recover their cluster, and have to rely on manually cleaning up disk space.
There are multiple approaches towards addressing this issue. One of the behaviors that users commonly want is to degrade gracefully, being unable to produce any new data to the disk, while still retaining the ability to call admin APIs and delete the data on disk. I am wondering what the views of the community are on this as a default behavior. Please let me know. I am sure this is a discussion that has happened multiple times in the past in the community. I would love to not re-invent the wheel or jump to a solution without context, and look to learn from the learnings from those discussions. What are the reasons why this might be challenging to implement? What do you think are the primary obstacles / unknowns? -- Regards Tirtha Chatterjee