Yeah, Jonathan, I'm the LinkedIn SRE who said that :) And Neha, up until
recently, sat 8 feet from my desk. The data from the wiki page is off a
little bit as well (we're running 14 disks now, and 64 GB systems)

So to hit the first questions, RAID 10 gives higher read performance, and
also allows you to suffer a disk failure without having to drop the entire
cluster. As Neha noted, you're going to take a hit on the rebuild, and
because of ongoing traffic in the cluster it will be for a long time (we
can easily take half a day to rebuild a disk). But you still get some
benefit out of the RAID over just killing the data and letting it rebuild
from the replica, because during that time the cluster is not under
replicated, so you can suffer another failure. The more servers and disks
you have, the more often disks are going to fail, not to mention other
components. Both hardware and software. I like running on the safer side.

That said, I'm not sure RAID 10 is the answer either. We're going to be
doing some experimenting with other disk layouts shortly. We've inherited a
lot of our architecture, and many things have changed in that time. We're
probably going to test out RAID 5 and 6 to start with and see how much we
lose from the parity calculations.

-Todd


On Wed, Oct 22, 2014 at 3:59 PM, Jonathan Weeks <jonathanbwe...@gmail.com>
wrote:

> Neha,
>
> Do you mean RAID 10 or RAID 5 or 6? With RAID 5 or 6, recovery is
> definitely very painful, but less so with RAID 10.
>
> We have been using the guidance here:
>
> http://www.youtube.com/watch?v=19DvtEC0EbQ#t=190 (LinkedIn Site
> Reliability Engineers state they run RAID 10 on all Kafka clusters @34:40
> or so)
>
> Plus: https://cwiki.apache.org/confluence/display/KAFKA/Operations
>
> LinkedIn
> Hardware
> We are using dual quad-core Intel Xeon machines with 24GB of memory. In
> general this should not matter too much, we only see pretty low CPU usage
> at peak even with GZIP compression enabled and a number of clients that
> don't batch requests. The memory is probably more than is needed for
> caching the active segments of the log.
> The disk throughput is important. We have 8x7200 rpm SATA drives in a RAID
> 10 array. In general this is the performance bottleneck, and more disks is
> more better. Depending on how you configure flush behavior you may or may
> not benefit from more expensive disks (if you flush often then higher RPM
> SAS drives may be better).
> OS Settings
> We use Linux. Ext4 is the filesystem and we run using software RAID 10. We
> haven't benchmarked filesystems so other filesystems may be superior.
> We have added two tuning changes: (1) we upped the number of file
> descriptors since we have lots of topics and lots of connections, and (2)
> we upped the max socket buffer size to enable high-performance data
> transfer between data centers (described here).
>
>
> Best Regards,
>
> -Jonathan
>
>
>
> On Oct 22, 2014, at 3:44 PM, Neha Narkhede <neha.narkh...@gmail.com>
> wrote:
>
> > In my experience, RAID 10 doesn't really provide value in the presence of
> > replication. When a disk fails, the RAID resync process is so I/O
> intensive
> > that it renders the broker useless until it completes. When this happens,
> > you actually have to take the broker out of rotation and move the leaders
> > off of it to prevent it from serving requests in a degraded state. You
> > might as well shutdown the broker, delete the broker's data and let it
> > catch up from the leader.
> >
> > On Wed, Oct 22, 2014 at 11:20 AM, Gwen Shapira <gshap...@cloudera.com>
> > wrote:
> >
> >> Makes sense. Thanks :)
> >>
> >> On Wed, Oct 22, 2014 at 11:10 AM, Jonathan Weeks
> >> <jonathanbwe...@gmail.com> wrote:
> >>> There are various costs when a broker fails, including broker leader
> >> election for each partition, etc., as well as exposing possible issues
> for
> >> in-flight messages, and client rebalancing etc.
> >>>
> >>> So even though replication provides partition redundancy, RAID 10 on
> >> each broker is usually a good tradeoff to prevent the typical most
> common
> >> cause of broker server failure (e.g. disk failure) as well, and overall
> >> smoother operation.
> >>>
> >>> Best Regards,
> >>>
> >>> -Jonathan
> >>>
> >>>
> >>> On Oct 22, 2014, at 11:01 AM, Gwen Shapira <gshap...@cloudera.com>
> >> wrote:
> >>>
> >>>> RAID-10?
> >>>> Interesting choice for a system where the data is already replicated
> >>>> between nodes. Is it to avoid the cost of large replication over the
> >>>> network? how large are these disks?
> >>>>
> >>>> On Wed, Oct 22, 2014 at 10:00 AM, Todd Palino <tpal...@gmail.com>
> >> wrote:
> >>>>> In fact there are many more than 4000 open files. Many of our brokers
> >> run
> >>>>> with 28,000+ open files (regular file handles, not network
> >> connections). In
> >>>>> our case, we're beefing up the disk performance as much as we can by
> >>>>> running in a RAID-10 configuration with 14 disks.
> >>>>>
> >>>>> -Todd
> >>>>>
> >>>>> On Tue, Oct 21, 2014 at 7:58 PM, Xiaobin She <xiaobin...@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> Todd,
> >>>>>>
> >>>>>> Actually I'm wondering how kafka handle so much partition, with one
> >>>>>> partition there is at least one file on disk, and with 4000
> partition,
> >>>>>> there will be at least 4000 files.
> >>>>>>
> >>>>>> When all these partitions have write request, how did Kafka make the
> >> write
> >>>>>> operation on the disk to be sequential (which is emphasized in the
> >> design
> >>>>>> document of Kafka) and make sure the disk access is effective?
> >>>>>>
> >>>>>> Thank you for your reply.
> >>>>>>
> >>>>>> xiaobinshe
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2014-10-22 5:10 GMT+08:00 Todd Palino <tpal...@gmail.com>:
> >>>>>>
> >>>>>>> As far as the number of partitions a single broker can handle,
> we've
> >> set
> >>>>>>> our cap at 4000 partitions (including replicas). Above that we've
> >> seen
> >>>>>> some
> >>>>>>> performance and stability issues.
> >>>>>>>
> >>>>>>> -Todd
> >>>>>>>
> >>>>>>> On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She <
> xiaobin...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> hello, everyone
> >>>>>>>>
> >>>>>>>> I'm new to kafka, I'm wondering what's the max num of partition
> can
> >> one
> >>>>>>>> siggle machine handle in Kafka?
> >>>>>>>>
> >>>>>>>> Is there an sugeest num?
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>> xiaobinshe
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>
> >>
>
>

Reply via email to