On Fri, Nov 2, 2018, at 03:14, Yuanjin Lin wrote:
> Hi all,
> 
> I am a software engineer from Zhihu.com. Kafka is so great and used heavily
> in Zhihu. There are probably over 2K Kafka brokers in total.
> 
> However, we are suffering from the problem that the performance degrades
> rapidly when the number of topics increases(sadly, we are using HDD).

Hi Yuanjin,

How many partitions are you trying to create?

Do you have benchmarks confirming that disk I/O is your bottleneck?  There are 
a few cases where large numbers of partitions may impose CPU and garbage 
collection burdens.  The patch on https://github.com/apache/kafka/pull/5206 
illustrates one of them.

> We are considering separating the logic layer and the storage layer of Kafka
> broker like Apache Pulsar.
> 
> After the modification, a server may have several Kafka brokers and more
> topics. Those brokers all connect to a sole storage engine via RP The
> sole storage can do the load balancing work easily, and avoid creating too
> many files which hurts HDD.
> 
> Is it hard? I think replacing the stuff in `Kafka.Log` would be enough,
> right?

It would help to know what the problem is here.  If the problem is a large 
number of files, then maybe the simplest approach would be creating fewer 
files.  You don't need to introduce a new layer of servers in order to do that. 
 You could use something like RocksDB to store messages and indices, or create 
your own file format which combined together things which were previously 
separate.  For example, we could combine the timeindex and index files.

As I understand it, Pulsar made the decision to combine together data from 
multiple partitions in a single file.  Sometimes a very large number of 
partitions.  This is great for writing, but not so good if you want to read 
historical data from a single topic.

regards,
Colin

Reply via email to