Hi Pete,

Yes you are right, both nodes has all of the data. I was just wondering
what is the scenario for losing one node, in production it might not fly.
If this is for testing only, you are good.

Answering your question, I think retention policy (log.retention.hours) is
for controlling the disk utilization. I think disk IO (log.flush.* section)
and network IO (num.network.threads, etc.) saturation you might want to
measure during tests and spec it based on that. Here is a link with
examples for the full list of relevant settings, with more description:
https://kafka.apache.org/08/ops.html.

 I guess the most important question is, how many clients do you want to
support. You could work out how much space you need based on that, assuming
few things. For more complete documentation refer to:
https://kafka.apache.org/08/configuration.html

Regards,
Istvan






On Tue, Oct 21, 2014 at 1:22 PM, Pete Wright <pwri...@rubiconproject.com>
wrote:

> Thanks Istvan - I think I understand what you are say here - although I
> was under the impression that if I ensured each topic was being
> replicated N+1 times a two node cluster would ensure each node has a
> copy of the entire contents of the message bus at any given time.
>
> I agree with your assessment though that having 3 nodes is a more
> durable configuration, but was hoping others could explain how they
> calculate capacity and scaling issues on their storage subsystems.
>
> Cheers,
> -pete
>
> On 10/21/14 11:28, István wrote:
> > One thing that you have to keep in mind is that moving 10T between nodes
> > takes long time. If you have a node failure and you need to rebuild
> > (resync) the data your system is going to be vulnerable against the
> second
> > node failure. You could mitigate this with using raid. I think generally
> > speaking 3 node clusters are better for production purposes.
> >
> > I.
> >
> > On Tue, Oct 21, 2014 at 11:12 AM, Pete Wright <
> pwri...@rubiconproject.com>
> > wrote:
> >
> >> Hi There,
> >>         I have a question regarding sizing disk for kafka brokers.
> Let's
> >> say I
> >> have systems capable of providing 10TB of storage, and they act as Kafka
> >> brokers.  If I were to deploy two of these nodes, and enable replication
> >> in Kafka, would I actually have 10TB available for my producers to write
> >> to?  Is there any overhead I should be concerned with?
> >>
> >> I guess I am just wanting to make sure that there are not any major
> >> pitfalls in deploying a two-node cluster, versus say a 3-node cluster.
> >>
> >> Any advice or best-practices would be very helpful!
> >>
> >> Thanks in advance,
> >> -pete
> >>
> >>
> >> --
> >> Pete Wright
> >> Systems Architect
> >> Rubicon Project
> >> pwri...@rubiconproject.com
> >> 310.309.9298
> >>
> >
> >
> >
>
> --
> Pete Wright
> Systems Architect
> Rubicon Project
> pwri...@rubiconproject.com
> 310.309.9298
>



-- 
the sun shines for all

Reply via email to