On Fri, Nov 2, 2018, at 03:14, Yuanjin Lin wrote: > Hi all, > > I am a software engineer from Zhihu.com. Kafka is so great and used heavily > in Zhihu. There are probably over 2K Kafka brokers in total. > > However, we are suffering from the problem that the performance degrades > rapidly when the number of topics increases(sadly, we are using HDD).
Hi Yuanjin, How many partitions are you trying to create? Do you have benchmarks confirming that disk I/O is your bottleneck? There are a few cases where large numbers of partitions may impose CPU and garbage collection burdens. The patch on https://github.com/apache/kafka/pull/5206 illustrates one of them. > We are considering separating the logic layer and the storage layer of Kafka > broker like Apache Pulsar. > > After the modification, a server may have several Kafka brokers and more > topics. Those brokers all connect to a sole storage engine via RP The > sole storage can do the load balancing work easily, and avoid creating too > many files which hurts HDD. > > Is it hard? I think replacing the stuff in `Kafka.Log` would be enough, > right? It would help to know what the problem is here. If the problem is a large number of files, then maybe the simplest approach would be creating fewer files. You don't need to introduce a new layer of servers in order to do that. You could use something like RocksDB to store messages and indices, or create your own file format which combined together things which were previously separate. For example, we could combine the timeindex and index files. As I understand it, Pulsar made the decision to combine together data from multiple partitions in a single file. Sometimes a very large number of partitions. This is great for writing, but not so good if you want to read historical data from a single topic. regards, Colin