Re: How many partition can one single machine handle in Kafka?

2014-10-24 Thread Xiaobin She
Todd, Thank you very much for your reply. My understanding of RAID 10 is wrong. I understand that one can not get absolute sequential disk access even on one single disk, the reason I'm interested with this question is that the design document of Kafka emphasize that Kafka make advantage of the

Re: How many partition can one single machine handle in Kafka?

2014-10-24 Thread Todd Palino
Hmm, I haven't read the design doc lately, but I'm surprised that there's even a discussion of sequential disk access. I suppose for small subsets of the writes you can write larger blocks of sequential data, but that's about the extent of it. Maybe one of the developers can speak more to that

Re: How many partition can one single machine handle in Kafka?

2014-10-24 Thread Gwen Shapira
Todd, Did you load-test using SSDs? Got numbers to share? On Fri, Oct 24, 2014 at 10:40 AM, Todd Palino tpal...@gmail.com wrote: Hmm, I haven't read the design doc lately, but I'm surprised that there's even a discussion of sequential disk access. I suppose for small subsets of the writes you

Re: How many partition can one single machine handle in Kafka?

2014-10-24 Thread Todd Palino
We haven't done any testing of Kafka on SSDs, mostly because our storage density needs are too high. Since our IO load has been fine on the current model, we haven't pushed in that direction yet. Additionally, I haven't done any real load testing since I got here, which is part of why we're going

Re: How many partition can one single machine handle in Kafka?

2014-10-23 Thread Neil Harkins
I've been thinking about this recently. If kafka provided cmdline hooks to be executed on segment rotation, similar to postgres' wal 'archive_command', configurations could store only the current segments and all their random i/o on flash, then once rotated, copy them sequentially onto

Re: How many partition can one single machine handle in Kafka?

2014-10-23 Thread Todd Palino
I've mentioned this a couple times in discussions recently as well. We were discussing the concept of infinite retention for a certain type of service, and how it might be accomplished. My suggestion was to have a combination of storage types and the ability for Kafka to look for segments in two

Re: How many partition can one single machine handle in Kafka?

2014-10-23 Thread István
This is actually a very vague statement and does not cover every use case. Having a RAID10 array of 6x250G SSDs is very different from having 4x1T spinning drives. In my experience rebuilding a raid10 array that has several smaller SSD disks is hardly noticeable from the service point of view,

Re: How many partition can one single machine handle in Kafka?

2014-10-23 Thread István
RAID has nothing to do with the overall availability of your system, it is just increasing the per node reliability. Regards, Istvan On Wed, Oct 22, 2014 at 11:01 AM, Gwen Shapira gshap...@cloudera.com wrote: RAID-10? Interesting choice for a system where the data is already replicated

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Todd Palino
The number of brokers doesn't really matter here, as far as I can tell, because the question is about what a single broker can handle. The number of partitions in the cluster is governed by the ability of the controller to manage the list of partitions for the cluster, and the ability of each

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Todd Palino
In fact there are many more than 4000 open files. Many of our brokers run with 28,000+ open files (regular file handles, not network connections). In our case, we're beefing up the disk performance as much as we can by running in a RAID-10 configuration with 14 disks. -Todd On Tue, Oct 21, 2014

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Gwen Shapira
RAID-10? Interesting choice for a system where the data is already replicated between nodes. Is it to avoid the cost of large replication over the network? how large are these disks? On Wed, Oct 22, 2014 at 10:00 AM, Todd Palino tpal...@gmail.com wrote: In fact there are many more than 4000 open

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Jonathan Weeks
There are various costs when a broker fails, including broker leader election for each partition, etc., as well as exposing possible issues for in-flight messages, and client rebalancing etc. So even though replication provides partition redundancy, RAID 10 on each broker is usually a good

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Gwen Shapira
Makes sense. Thanks :) On Wed, Oct 22, 2014 at 11:10 AM, Jonathan Weeks jonathanbwe...@gmail.com wrote: There are various costs when a broker fails, including broker leader election for each partition, etc., as well as exposing possible issues for in-flight messages, and client rebalancing

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Neha Narkhede
In my experience, RAID 10 doesn't really provide value in the presence of replication. When a disk fails, the RAID resync process is so I/O intensive that it renders the broker useless until it completes. When this happens, you actually have to take the broker out of rotation and move the leaders

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Jonathan Weeks
Neha, Do you mean RAID 10 or RAID 5 or 6? With RAID 5 or 6, recovery is definitely very painful, but less so with RAID 10. We have been using the guidance here: http://www.youtube.com/watch?v=19DvtEC0EbQ#t=190 (LinkedIn Site Reliability Engineers state they run RAID 10 on all Kafka clusters

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Todd Palino
Yeah, Jonathan, I'm the LinkedIn SRE who said that :) And Neha, up until recently, sat 8 feet from my desk. The data from the wiki page is off a little bit as well (we're running 14 disks now, and 64 GB systems) So to hit the first questions, RAID 10 gives higher read performance, and also allows

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Jonathan Weeks
I suppose it also is going to depend on: a) How much spare I/O bandwidth the brokers have as well to support a rebuild while supporting ongoing requests. Our brokers have spare IO capacity. b) How many brokers are in the cluster and what the replication factor is — e.g. if you have a larger

Re: How many partition can one single machine handle in Kafka?

2014-10-22 Thread Xiaobin She
Todd, Thank you for the information. With 28,000+ files and 14 disks, that makes there are averagely about 4000 open files on two disk ( which is treated as one single disk) , am I right? How do you manage to make the all the write operation to thest 4000 open files be sequential to the disk?

How many partition can one single machine handle in Kafka?

2014-10-21 Thread Xiaobin She
hello, everyone I'm new to kafka, I'm wondering what's the max num of partition can one siggle machine handle in Kafka? Is there an sugeest num? Thanks. xiaobinshe

Re: How many partition can one single machine handle in Kafka?

2014-10-21 Thread Guozhang Wang
Xiaobin, This FAQ may give you some hints: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic ? On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She xiaobin...@gmail.com wrote: hello, everyone I'm new to kafka, I'm wondering what's the max num of

Re: How many partition can one single machine handle in Kafka?

2014-10-21 Thread Todd Palino
As far as the number of partitions a single broker can handle, we've set our cap at 4000 partitions (including replicas). Above that we've seen some performance and stability issues. -Todd On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She xiaobin...@gmail.com wrote: hello, everyone I'm new to

Re: How many partition can one single machine handle in Kafka?

2014-10-21 Thread Neil Harkins
On Tue, Oct 21, 2014 at 2:10 PM, Todd Palino tpal...@gmail.com wrote: As far as the number of partitions a single broker can handle, we've set our cap at 4000 partitions (including replicas). Above that we've seen some performance and stability issues. How many brokers? I'm curious: what kinds

Re: How many partition can one single machine handle in Kafka?

2014-10-21 Thread Xiaobin She
Todd, Actually I'm wondering how kafka handle so much partition, with one partition there is at least one file on disk, and with 4000 partition, there will be at least 4000 files. When all these partitions have write request, how did Kafka make the write operation on the disk to be sequential