This is a good point. We have discussed this a little bit before. The key
constraint is that with replication factor 1 you can choose one of the
following: (1) high availability, (2) correct semantic partitioning. That
is to say, if a particular partition is unavailable you have no choice but
to give up and throw an error or else send the message elsewhere.

Obviously replication fixes this by just making the partitions highly
available.

It isn't really correct for us to choose one of these for the user. If they
are depending on partitioning, silently sending data elsewhere may be worse
then giving an error. So the user needs to somehow specify which behavior
they want.

Here is a JIRA where we can work out the details. I suspect this is a
blocker for 0.8:
https://issues.apache.org/jira/browse/KAFKA-691

As a work around in the meantime you can probably run with
replication--although it sounds like you don't really need it, it shouldn't
hurt.

-Jay


On Wed, Jan 9, 2013 at 2:38 AM, Maxime Brugidou
<maxime.brugi...@gmail.com>wrote:

> Hello, I am currently testing the 0.8 branch (and it works quite well). We
> plan to not use the replication feature for now since we don't really need
> it, we can afford to lose data in case of unrecoverable failure from a
> broker.
>
> However, we really don't want to have producers/consumers fail if a broker
> is down. The ideal scenario (that was working on 0.7) is that producers
> would just produce to available partitions and consumers would consume from
> available partitions. If the broker comes back online, the consumer will
> catch up, if not we can decide to throw away the data.
>
> Is this feasible from 0.8? right now if i kill a broker it just makes
> everything fail...
>
> Multiple issues will come up:
> - Since now the partitions are set globally and never change, the
> availability of a topic vary depending on where the partitions are located
> - We would need tools to make sure topics are spread enough and rebalance
> them accordingly, (using the "DDL" i heard about, i'm not sure yet about
> how it works, i tried editing the json strings in zk, it somehow works, and
> there's the reassignment admin command too)
>
> That looks rather complicated, or maybe I'm missing something? The model
> that was used in 0.7 looked much easier to operate (it had drawbacks, and
> couldn't do intra-cluster replication, but at least the availability of the
> cluster was much higher).
>
> Thanks in advance for any help/clues,
>
> Maxime
>

Reply via email to