I'm with you on that, Jay, although I don't consider your case common. More
common is that there are things on all the disks at their normal retention,
and you add something new. That said, it doesn't really matter because what
you're illustrating is a valid concern. Automatic balancing would probable
alleviate any issues coming from a bad initial placement.

Jumping back an email, yes, it is a really big deal that the entire broker
fails when one mount point fails. It is much better to run with degraded
performance than it is to run with degraded replication, and disks fail
constantly. If I have 10% of my machines offline, Kafka's not going to last
very long at LinkedIn ;)

-Todd


On Sat, Apr 11, 2015 at 11:58 AM, Jay Kreps <jay.kr...@gmail.com> wrote:

> Hey Todd,
>
> The problem you pointed out is real. Unfortunately, placing by available
> size at creation time actually makes things worse. The original plan was to
> place new partitions on the disk with the most space, but consider a common
> case:
>  disk 1: 500M
>  disk 2: 0M
> Now say you are creating 10 partitions for what will be a massively large
> topic. You will place them all on disk 2 as it has the most space, but then
> immediately you will discover that that was a bad idea as those partitions
> get huge. I think the current balancing by number of partitions is better
> than this because at least you get basically random assignment.
>
> I think to solve the problem you describe we need to do active
> rebalancing--predicting the size of partitions ahead of time is basically
> impossible.
>
> I think having the controller be unaware of disks is probably a good thing.
> So the controller would balance partitions over servers and the server
> would be responsible for balancing over disks.
>
> I think this kind of balancing is possible though not totally trivial.
> Basically a background thread in LogManager would decide that there was too
> much skew in data assignment (actually debatable whether it is data size or
> I/O throughput that you should optimize) and would then try to rebalance.
> To do the rebalance it would do a background copy of the log from the
> current disk to the new disk, then it would take the partition offline and
> delete the old log, then bring the partition back using the new log and
> catch it back up off the leader.
>
> -Jay
>
> On Thu, Apr 9, 2015 at 8:19 AM, Todd Palino <tpal...@gmail.com> wrote:
>
> > I think this is a good start. We've been discussing JBOD internally, so
> > it's good to see a discussion going externally about it as well.
> >
> > The other big blocker to using JBOD is the lack of intelligent partition
> > assignment logic, and the lack of tools to adjust it. The controller is
> not
> > smart enough to take into account disk usage when deciding to place a
> > partition, which may not be a big concern (at the controller level, you
> > worry about broker usage, not individual mounts). However, the broker is
> > not smart enough to do it either, when looking at the local directories.
> It
> > just round robins.
> >
> > In addition, there is no tool available to move a partition from one
> mount
> > point to another. So in the case that you do have a hot disk, you cannot
> do
> > anything about it without shutting down the broker and doing a manual
> move
> > of the log segments.
> >
> > -Todd
> >
> >
> > On Thu, Apr 9, 2015 at 5:36 AM, Andrii Biletskyi <
> > andrii.bilets...@stealth.ly> wrote:
> >
> > > Hi,
> > >
> > > Let me start discussion thread for KIP-18 - JBOD Support.
> > >
> > > Link to wiki:
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-18+-+JBOD+Support
> > >
> > >
> > > Thanks,
> > > Andrii Biletskyi
> > >
> >
>

Reply via email to