Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Dong Lin Wed, 25 Jan 2017 13:50:39 -0800

Hey Colin,

Good point! Yeah we have actually considered and tested this solution,
which we call one-broker-per-disk. It would work and should require no
major change in Kafka as compared to this JBOD KIP. So it would be a good
short term solution.

But it has a few drawbacks which makes it less desirable in the long term.
Assume we have 10 disks on a machine. Here are the problems:

1) Our stress test result shows that one-broker-per-disk has 15% lower
throughput

2) Controller would need to send 10X as many LeaderAndIsrRequest,
MetadataUpdateRequest and StopReplicaRequest. This increases the burden on
controller which can be the performance bottleneck.

3) Less efficient use of physical resource on the machine. The number of
socket on each machine will increase by 10X. The number of connection
between any two machine will increase by 100X.

4) Less efficient way to management memory and quota.

5) Rebalance between disks/brokers on the same machine will less efficient
and less flexible. Broker has to read data from another broker on the same
machine via socket. It is also harder to do automatic load balance between
disks on the same machine in the future.

I will put this and the explanation in the rejected alternative section. I
have a few questions:

- Can you explain why this solution can help avoid scalability bottleneck?
I actually think it will exacerbate the scalability problem due the 2)
above.
- Why can we push more RPC with this solution?
- It is true that a garbage collection in one broker would not affect
others. But that is after every broker only uses 1/10 of the memory. Can we
be sure that this will actually help performance?

Thanks,
Dong

On Wed, Jan 25, 2017 at 11:34 AM, Colin McCabe <[email protected]> wrote:

> Hi Dong,
>
> Thanks for the writeup!  It's very interesting.
>
> I apologize in advance if this has been discussed somewhere else.  But I
> am curious if you have considered the solution of running multiple
> brokers per node.  Clearly there is a memory overhead with this solution
> because of the fixed cost of starting multiple JVMs.  However, running
> multiple JVMs would help avoid scalability bottlenecks.  You could
> probably push more RPCs per second, for example.  A garbage collection
> in one broker would not affect the others.  It would be interesting to
> see this considered in the "alternate designs" design, even if you end
> up deciding it's not the way to go.
>
> best,
> Colin
>
>
> On Thu, Jan 12, 2017, at 10:46, Dong Lin wrote:
> > Hi all,
> >
> > We created KIP-112: Handle disk failure for JBOD. Please find the KIP
> > wiki
> > in the link https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 112%3A+Handle+disk+failure+for+JBOD.
> >
> > This KIP is related to KIP-113
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 113%3A+Support+replicas+movement+between+log+directories>:
> > Support replicas movement between log directories. They are needed in
> > order
> > to support JBOD in Kafka. Please help review the KIP. You feedback is
> > appreciated!
> >
> > Thanks,
> > Dong
>

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Reply via email to