I think it's a great idea to raise a KIP to look at adjusting defaults and minimum/maximum config values for version 4.0.
As pointed out, the minimum values for segment.ms and segment.bytes don't make sense and would probably bring down a cluster pretty quickly if set that low, so version 4.0 is a good time to fix it and to also look at the other configs as well for adjustments. On Wed, Mar 13, 2024 at 4:39 AM Sergio Daniel Troiano <sergio.troi...@adevinta.com.invalid> wrote: > hey guys, > > Regarding to num.recovery.threads.per.data.dir: I agree, in our company we > use the number of vCPUs to do so as this is not competing with ready > cluster traffic. > > > On Wed, 13 Mar 2024 at 09:29, Luke Chen <show...@gmail.com> wrote: > > > Hi Divij, > > > > Thanks for raising this. > > The valid minimum value 1 for `segment.ms` is completely unreasonable. > > Similarly for `segment.bytes`, `metadata.log.segment.ms`, > > `metadata.log.segment.bytes`. > > > > In addition to that, there are also some config default values we'd like > to > > propose to change in v4.0. > > We can collect more comments from the community, and come out with a KIP > > for them. > > > > 1. num.recovery.threads.per.data.dir: > > The current default value is 1. But the log recovery is happening before > > brokers are in ready state, which means, we should use all the available > > resource to speed up the log recovery to bring the broker to ready state > > soon. Default value should be... maybe 4 (to be decided)? > > > > 2. Other configs might be able to consider to change the default, but > open > > for comments: > > 2.1. num.replica.fetchers: default is 1, but that's not enough when > > there are multiple partitions in the cluster > > 2.2. `socket.send.buffer.bytes`/`socket.receive.buffer.bytes`: > > Currently, we set 100kb as default value, but that's not enough for > > high-speed network. > > > > Thank you. > > Luke > > > > > > On Tue, Mar 12, 2024 at 1:32 AM Divij Vaidya <divijvaidy...@gmail.com> > > wrote: > > > > > Hey folks > > > > > > Before I file a KIP to change this in 4.0, I wanted to understand the > > > historical context for the value of the following setting. > > > > > > Currently, segment.ms minimum threshold is set to 1ms [1]. > > > > > > Segments are expensive. Every segment uses multiple file descriptors > and > > > it's easy to run out of OS limits when creating a large number of > > segments. > > > Large number of segments also delays log loading on startup because of > > > expensive operations such as iterating through all directories & > > > conditionally loading all producer state. > > > > > > I am currently not aware of a reason as to why someone might want to > work > > > with a segment.ms of less than ~10s (number chosen arbitrary that > looks > > > sane) > > > > > > What was the historical context of setting the minimum threshold to 1ms > > for > > > this setting? > > > > > > [1] > https://kafka.apache.org/documentation.html#topicconfigs_segment.ms > > > > > > -- > > > Divij Vaidya > > > > > >