KIP is updated with more details and how to handle the situation where rack
information is incomplete.

In the situation where rack information is incomplete, but we want to
continue with the assignment, I have suggested to ignore all rack
information and fallback to original algorithm. The reason is explained
below:

The other options are to assume that the broker without the rack belong to
its own unique rack, or they belong to one "default" rack. Either way we
choose, it is highly likely to result in uneven number of brokers in racks,
and it is quite possible that the "made up" racks will have much fewer
number of brokers. As I explained in the KIP, uneven number of brokers in
racks will lead to uneven distribution of replicas among brokers (even
though the leader distribution is still even). The brokers in the rack that
has fewer number of brokers will get more replicas per broker than brokers
in other racks.

Given this fact and the replica assignment produced will be incorrect
anyway from rack aware point of view, ignoring all rack information and
fallback to the original algorithm is not a bad choice since it will at
least have a better guarantee of replica distribution.

Also for command line tools it gives user a choice if for any reason they
want to ignore rack information and fallback to the original algorithm.


On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <allenxw...@gmail.com> wrote:

> I am busy with some time pressing issues for the last few days. I will
> think about how the incomplete rack information will affect the balance and
> update the KIP by early next week.
>
> Thanks,
> Allen
>
>
> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <n...@confluent.io> wrote:
>
>> Few suggestions on improving the KIP
>>
>> *If some brokers have rack, and some do not, the algorithm will thrown an
>> > exception. This is to prevent incorrect assignment caused by user
>> error.*
>>
>>
>> In the KIP, can you clearly state the user-facing behavior when some
>> brokers have rack information and some don't. Which actions and requests
>> will error out and how?
>>
>> *Even distribution of partition leadership among brokers*
>>
>>
>> There is some information about arranging the sorted broker list
>> interlaced
>> with rack ids. Can you describe the changes to the current algorithm in a
>> little more detail? How does this interlacing work if only a subset of
>> brokers have the rack id configured? Does this still work if uneven # of
>> brokers are assigned to each rack? It might work, I'm looking for more
>> details on the changes, since it will affect the behavior seen by the user
>> - imbalance on either the leaders or data or both.
>>
>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <aaurad...@linkedin.com>
>> wrote:
>>
>> > I think this sounds reasonable. Anyone else have comments?
>> >
>> > Aditya
>> >
>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <allenxw...@gmail.com>
>> wrote:
>> >
>> > > During the discussion in the hangout, it was mentioned that it would
>> be
>> > > desirable that consumers know the rack information of the brokers so
>> that
>> > > they can consume from the broker in the same rack to reduce latency.
>> As I
>> > > understand this will only be beneficial if consumer can consume from
>> any
>> > > broker in ISR, which is not possible now.
>> > >
>> > > I suggest we skip the change to TMR. Once the change is made to
>> consumer
>> > to
>> > > be able to consume from any broker in ISR, the rack information can be
>> > > added to TMR.
>> > >
>> > > Another thing I want to confirm is  command line behavior. I think the
>> > > desirable default behavior is to fail fast on command line for
>> incomplete
>> > > rack mapping. The error message can include further instruction that
>> > tells
>> > > the user to add an extra argument (like "--allow-partial-rackinfo") to
>> > > suppress the error and do an imperfect rack aware assignment. If the
>> > > default behavior is to allow incomplete mapping, the error can still
>> be
>> > > easily missed.
>> > >
>> > > The affected command line tools are TopicCommand and
>> > > ReassignPartitionsCommand.
>> > >
>> > > Thanks,
>> > > Allen
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
>> > aaurad...@linkedin.com>
>> > > wrote:
>> > >
>> > > > Hi Allen,
>> > > >
>> > > > For TopicMetadataResponse to understand version, you can bump up the
>> > > > request version itself. Based on the version of the request, the
>> > response
>> > > > can be appropriately serialized. It shouldn't be a huge change. For
>> > > > example: We went through something similar for ProduceRequest
>> recently
>> > (
>> > > > https://reviews.apache.org/r/33378/)
>> > > > I guess the reason protocol information is not included in the TMR
>> is
>> > > > because the topic itself is independent of any particular protocol
>> (SSL
>> > > vs
>> > > > Plaintext). Having said that, I'm not sure we even need rack
>> > information
>> > > in
>> > > > TMR. What usecase were you thinking of initially?
>> > > >
>> > > > For 1 - I'd be fine with adding an option to the command line tools
>> > that
>> > > > check rack assignment. For e.g. "--strict-assignment" or something
>> > > similar.
>> > > >
>> > > > Aditya
>> > > >
>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <allenxw...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > For 2 and 3, I have updated the KIP. Please take a look. One
>> thing I
>> > > have
>> > > > > changed is removing the proposal to add rack to
>> > TopicMetadataResponse.
>> > > > The
>> > > > > reason is that unlike UpdateMetadataRequest, TopicMetadataResponse
>> > does
>> > > > not
>> > > > > understand version. I don't see a way to include rack without
>> > breaking
>> > > > old
>> > > > > version of clients. That's probably why secure protocol is not
>> > included
>> > > > in
>> > > > > the TopicMetadataResponse either. I think it will be a much bigger
>> > > change
>> > > > > to include rack in TopicMetadataResponse.
>> > > > >
>> > > > > For 1, my concern is that doing rack aware assignment without
>> > complete
>> > > > > broker to rack mapping will result in assignment that is not rack
>> > aware
>> > > > and
>> > > > > fail to provide fault tolerance in the event of rack outage. This
>> > kind
>> > > of
>> > > > > problem will be difficult to surface. And the cost of this
>> problem is
>> > > > high:
>> > > > > you have to do partition reassignment if you are lucky to spot the
>> > > > problem
>> > > > > early on or face the consequence of data loss during real rack
>> > outage.
>> > > > >
>> > > > > I do see the concern of fail-fast as it might also cause data
>> loss if
>> > > > > producer is not able produce the message due to topic creation
>> > failure.
>> > > > Is
>> > > > > it feasible to treat dynamic topic creation and command tools
>> > > > differently?
>> > > > > We allow dynamic topic creation with incomplete broker-rack
>> mapping
>> > and
>> > > > > fail fast in command line. Another option is to let user determine
>> > the
>> > > > > behavior for command line. For example, by default fail fast in
>> > command
>> > > > > line but allow incomplete broker-rack mapping if another switch is
>> > > > > provided.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
>> > > > > aaurad...@linkedin.com.invalid> wrote:
>> > > > >
>> > > > > > Hey Allen,
>> > > > > >
>> > > > > > 1. If we choose fail fast topic creation, we will have topic
>> > creation
>> > > > > > failures while upgrading the cluster. I really doubt we want
>> this
>> > > > > behavior.
>> > > > > > Ideally, this should be invisible to clients of a cluster.
>> > Currently,
>> > > > > each
>> > > > > > broker is effectively its own rack. So we probably can use the
>> rack
>> > > > > > information whenever possible but not make it a hard
>> requirement.
>> > To
>> > > > > extend
>> > > > > > Gwen's example, one badly configured broker should not degrade
>> > topic
>> > > > > > creation for the entire cluster.
>> > > > > >
>> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade
>> piece to
>> > > > > confirm
>> > > > > > that old clients will not see errors? I believe
>> > > > > ZookeeperConsumerConnector
>> > > > > > reads the Broker objects from ZK. I wanted to confirm that this
>> > will
>> > > > not
>> > > > > > cause any problems.
>> > > > > >
>> > > > > > 3. Could you elaborate your proposed changes to the
>> > > > UpdateMetadataRequest
>> > > > > > in the "Public Interfaces" section? Personally, I find this
>> format
>> > > easy
>> > > > > to
>> > > > > > read in terms of wire protocol changes:
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
>> > > > > >
>> > > > > > Aditya
>> > > > > >
>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
>> allenxw...@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > KIP is updated include rack as an optional property for
>> broker.
>> > > > Please
>> > > > > > take
>> > > > > > > a look and let me know if more details are needed.
>> > > > > > >
>> > > > > > > For the case where some brokers have rack and some do not, the
>> > > > current
>> > > > > > KIP
>> > > > > > > uses the fail-fast behavior. If there are concerns, we can
>> > further
>> > > > > > discuss
>> > > > > > > this in the email thread or next hangout.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
>> > allenxw...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > That's a good question. I can think of three actions if the
>> > rack
>> > > > > > > > information is incomplete:
>> > > > > > > >
>> > > > > > > > 1. Treat the node without rack as if it is on its unique
>> rack
>> > > > > > > > 2. Disregard all rack information and fallback to current
>> > > algorithm
>> > > > > > > > 3. Fail-fast
>> > > > > > > >
>> > > > > > > > Now I think about it, one and three make more sense. The
>> reason
>> > > for
>> > > > > > > > fail-fast is that user mistake for not providing the rack
>> may
>> > > never
>> > > > > be
>> > > > > > > > found if we tolerate that and the assignment may not be rack
>> > > aware
>> > > > as
>> > > > > > the
>> > > > > > > > user has expected and this creates debug problems when
>> things
>> > > fail.
>> > > > > > > >
>> > > > > > > > What do you think? If not fail-fast, is there anyway we can
>> > make
>> > > > the
>> > > > > > user
>> > > > > > > > error standing out?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
>> > > g...@confluent.io>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > >> Thanks! Just to clarify, when some brokers have rack
>> > assignment
>> > > > and
>> > > > > > some
>> > > > > > > >> don't, do we act like none of them have it? or like those
>> > > without
>> > > > > > > >> assignment are in their own rack?
>> > > > > > > >>
>> > > > > > > >> The first scenario is good when first setting up
>> > rack-awareness,
>> > > > but
>> > > > > > the
>> > > > > > > >> second makes more sense for on-going maintenance (I can
>> > totally
>> > > > see
>> > > > > > > >> someone
>> > > > > > > >> adding a node and forgetting to set the rack property, we
>> > don't
>> > > > want
>> > > > > > > this
>> > > > > > > >> to change behavior for anything except the new node).
>> > > > > > > >>
>> > > > > > > >> What do you think?
>> > > > > > > >>
>> > > > > > > >> Gwen
>> > > > > > > >>
>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
>> > > > allenxw...@gmail.com>
>> > > > > > > >> wrote:
>> > > > > > > >>
>> > > > > > > >> > For scenario 1:
>> > > > > > > >> >
>> > > > > > > >> > - Add the rack information to broker property file or
>> > > > dynamically
>> > > > > > set
>> > > > > > > >> it in
>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You would do
>> > that
>> > > > for
>> > > > > > all
>> > > > > > > >> > brokers and restart the brokers one by one.
>> > > > > > > >> >
>> > > > > > > >> > In this scenario, the complete broker to rack mapping may
>> > not
>> > > be
>> > > > > > > >> available
>> > > > > > > >> > until every broker is restarted. During that time we fall
>> > back
>> > > > to
>> > > > > > > >> default
>> > > > > > > >> > replica assignment algorithm.
>> > > > > > > >> >
>> > > > > > > >> > For scenario 2:
>> > > > > > > >> >
>> > > > > > > >> > - Add the rack information to broker property file or
>> > > > dynamically
>> > > > > > set
>> > > > > > > >> it in
>> > > > > > > >> > the wrapper code and start the broker.
>> > > > > > > >> >
>> > > > > > > >> >
>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
>> > > > g...@confluent.io>
>> > > > > > > >> wrote:
>> > > > > > > >> >
>> > > > > > > >> > > Can you clarify the workflow for the following
>> scenarios:
>> > > > > > > >> > >
>> > > > > > > >> > > 1. I currently have 6 brokers and want to add rack
>> > > information
>> > > > > for
>> > > > > > > >> each
>> > > > > > > >> > > 2. I'm adding a new broker and I want to specify which
>> > rack
>> > > it
>> > > > > > > >> belongs on
>> > > > > > > >> > > while adding it.
>> > > > > > > >> > >
>> > > > > > > >> > > Thanks!
>> > > > > > > >> > >
>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
>> > > > > allenxw...@gmail.com
>> > > > > > >
>> > > > > > > >> > wrote:
>> > > > > > > >> > >
>> > > > > > > >> > > > We discussed the KIP in the hangout today. The
>> > > > recommendation
>> > > > > is
>> > > > > > > to
>> > > > > > > >> > make
>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For users
>> with
>> > > > > existing
>> > > > > > > rack
>> > > > > > > >> > > > information stored somewhere, they would need to
>> > retrieve
>> > > > the
>> > > > > > > >> > information
>> > > > > > > >> > > > at broker start up and dynamically set the rack
>> > property,
>> > > > > which
>> > > > > > > can
>> > > > > > > >> be
>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker. There
>> will
>> > > be
>> > > > no
>> > > > > > > >> > interface
>> > > > > > > >> > > or
>> > > > > > > >> > > > pluggable implementation to retrieve the rack
>> > information.
>> > > > > > > >> > > >
>> > > > > > > >> > > > The assumption is that you always need to restart the
>> > > broker
>> > > > > to
>> > > > > > > >> make a
>> > > > > > > >> > > > change to the rack.
>> > > > > > > >> > > >
>> > > > > > > >> > > > Once the rack becomes a broker property, it will be
>> > > possible
>> > > > > to
>> > > > > > > make
>> > > > > > > >> > rack
>> > > > > > > >> > > > part of the meta data to help the consumer choose
>> which
>> > in
>> > > > > sync
>> > > > > > > >> replica
>> > > > > > > >> > > to
>> > > > > > > >> > > > consume from as part of the future consumer
>> enhancement.
>> > > > > > > >> > > >
>> > > > > > > >> > > > I will update the KIP.
>> > > > > > > >> > > >
>> > > > > > > >> > > > Thanks,
>> > > > > > > >> > > > Allen
>> > > > > > > >> > > >
>> > > > > > > >> > > >
>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
>> > > > > > allenxw...@gmail.com>
>> > > > > > > >> > wrote:
>> > > > > > > >> > > >
>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was
>> not
>> > > > > > discussed
>> > > > > > > >> due
>> > > > > > > >> > to
>> > > > > > > >> > > > > time constraint.
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > However, after hearing discussion of KIP-35, I have
>> > the
>> > > > > > feeling
>> > > > > > > >> that
>> > > > > > > >> > > > > incompatibility (caused by new broker property)
>> > between
>> > > > > > brokers
>> > > > > > > >> with
>> > > > > > > >> > > > > different versions  will be solved there. In
>> addition,
>> > > > > having
>> > > > > > > >> stack
>> > > > > > > >> > in
>> > > > > > > >> > > > > broker property as meta data may also help
>> consumers
>> > in
>> > > > the
>> > > > > > > >> future.
>> > > > > > > >> > So
>> > > > > > > >> > > I
>> > > > > > > >> > > > am
>> > > > > > > >> > > > > open to adding stack property to broker.
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP
>> hangout.
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
>> > > > > > > allenxw...@gmail.com
>> > > > > > > >> >
>> > > > > > > >> > > > wrote:
>> > > > > > > >> > > > >
>> > > > > > > >> > > > >> Can you send me the information on the next KIP
>> > > hangout?
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >> Currently the broker-rack mapping is not cached.
>> In
>> > > > > > KafkaApis,
>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each time the
>> > > mapping
>> > > > > is
>> > > > > > > >> needed
>> > > > > > > >> > > for
>> > > > > > > >> > > > >> auto topic creation. This will ensure latest
>> mapping
>> > is
>> > > > > used
>> > > > > > at
>> > > > > > > >> any
>> > > > > > > >> > > > time.
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >> The ability to get the complete mapping makes it
>> > simple
>> > > > to
>> > > > > > > reuse
>> > > > > > > >> the
>> > > > > > > >> > > > same
>> > > > > > > >> > > > >> interface in command line tools.
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar
>> <
>> > > > > > > >> > > > >> aaurad...@linkedin.com.invalid> wrote:
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP
>> hangout?
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> I do see that a pluggable rack locator can be
>> useful
>> > > > but I
>> > > > > > do
>> > > > > > > >> see a
>> > > > > > > >> > > few
>> > > > > > > >> > > > >>> concerns:
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> - The RackLocator (as described in the document),
>> > > > implies
>> > > > > > that
>> > > > > > > >> it
>> > > > > > > >> > can
>> > > > > > > >> > > > >>> discover rack information for any node in the
>> > cluster.
>> > > > How
>> > > > > > > does
>> > > > > > > >> it
>> > > > > > > >> > > deal
>> > > > > > > >> > > > >>> with rack location changes? For example, if I
>> moved
>> > > > broker
>> > > > > > id
>> > > > > > > >> (1)
>> > > > > > > >> > > from
>> > > > > > > >> > > > >>> rack
>> > > > > > > >> > > > >>> X to Y, I only have to start that broker with a
>> > newer
>> > > > rack
>> > > > > > > >> config.
>> > > > > > > >> > If
>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack information
>> at
>> > > > start
>> > > > > up
>> > > > > > > >> time,
>> > > > > > > >> > > any
>> > > > > > > >> > > > >>> change to a broker will require bouncing the
>> entire
>> > > > > cluster
>> > > > > > > >> since
>> > > > > > > >> > > > >>> createTopic requests can be sent to any node in
>> the
>> > > > > cluster.
>> > > > > > > >> > > > >>> For this reason it may be simpler to have each
>> node
>> > be
>> > > > > aware
>> > > > > > > of
>> > > > > > > >> its
>> > > > > > > >> > > own
>> > > > > > > >> > > > >>> rack and persist it in ZK during start up time.
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an external
>> > > service
>> > > > > > being
>> > > > > > > >> > > available
>> > > > > > > >> > > > >>> to
>> > > > > > > >> > > > >>> serve rack information.
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple of
>> other
>> > > > > systems
>> > > > > > > deal
>> > > > > > > >> > with
>> > > > > > > >> > > > >>> zone/rack awareness.
>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
>> > > > > > > >> > > > >>> (Property File configuration)
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > >
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>> > > > > > > >> > > > >>> (Dynamic inference)
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > >
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> Voldemort does a static node -> zone assignment
>> > based
>> > > on
>> > > > > > > >> > > configuration.
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> Aditya
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
>> > > > > > > >> allenxw...@gmail.com
>> > > > > > > >> > >
>> > > > > > > >> > > > >>> wrote:
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> > I would like to see if we can do both:
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate
>> > migration
>> > > > > with
>> > > > > > > >> > existing
>> > > > > > > >> > > > >>> > broker-rack mapping
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > - Make rack an optional property for broker. If
>> > rack
>> > > > is
>> > > > > > > >> available
>> > > > > > > >> > > > from
>> > > > > > > >> > > > >>> > broker, treat it as source of truth. For users
>> > with
>> > > > > > existing
>> > > > > > > >> > > > >>> broker-rack
>> > > > > > > >> > > > >>> > mapping somewhere else, they can use the
>> pluggable
>> > > way
>> > > > > or
>> > > > > > > they
>> > > > > > > >> > can
>> > > > > > > >> > > > >>> transfer
>> > > > > > > >> > > > >>> > the mapping to the broker rack property.
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > One thing I am not sure is what happens at
>> rolling
>> > > > > upgrade
>> > > > > > > >> when
>> > > > > > > >> > we
>> > > > > > > >> > > > have
>> > > > > > > >> > > > >>> > rack as a broker property. For brokers with
>> older
>> > > > > version
>> > > > > > of
>> > > > > > > >> > Kafka,
>> > > > > > > >> > > > >>> will it
>> > > > > > > >> > > > >>> > cause problem for them? If so, is there any
>> > > > workaround?
>> > > > > I
>> > > > > > > also
>> > > > > > > >> > > think
>> > > > > > > >> > > > it
>> > > > > > > >> > > > >>> > would be better not to have rack in the
>> controller
>> > > > wire
>> > > > > > > >> protocol
>> > > > > > > >> > > but
>> > > > > > > >> > > > >>> not
>> > > > > > > >> > > > >>> > sure if it is achievable.
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > Thanks,
>> > > > > > > >> > > > >>> > Allen
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
>> > > > > > > >> tpal...@gmail.com>
>> > > > > > > >> > > > >>> wrote:
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable
>> locator.
>> > > For
>> > > > > > > >> example, we
>> > > > > > > >> > > > >>> already
>> > > > > > > >> > > > >>> > > have an interface for discovering information
>> > > about
>> > > > > the
>> > > > > > > >> > physical
>> > > > > > > >> > > > >>> location
>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of
>> having to
>> > > > > > maintain
>> > > > > > > >> data
>> > > > > > > >> > in
>> > > > > > > >> > > > >>> > multiple
>> > > > > > > >> > > > >>> > > places.
>> > > > > > > >> > > > >>> > >
>> > > > > > > >> > > > >>> > > -Todd
>> > > > > > > >> > > > >>> > >
>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
>> > Auradkar <
>> > > > > > > >> > > > >>> > > aaurad...@linkedin.com.invalid> wrote:
>> > > > > > > >> > > > >>> > >
>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a RackLocator
>> > > class
>> > > > > that
>> > > > > > > is
>> > > > > > > >> > > > pluggable
>> > > > > > > >> > > > >>> > seems
>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
>> > potentially
>> > > > > > non-ZK
>> > > > > > > >> > storage
>> > > > > > > >> > > > >>> for the
>> > > > > > > >> > > > >>> > > > rack info which I don't think is necessary.
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in zk
>> under
>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
>> > > > > > > >> > > > >>> > > > similar to other broker properties and add
>> a
>> > > > config
>> > > > > in
>> > > > > > > >> > > > KafkaConfig
>> > > > > > > >> > > > >>> > called
>> > > > > > > >> > > > >>> > > > "rack".
>> > > > > > > >> > > > >>> > > >
>> > > > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
>> > > > > > > >> > > "rack":
>> > > > > > > >> > > > >>> > "abc"}
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > > > Aditya
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen
>> Shapira
>> > <
>> > > > > > > >> > > g...@confluent.io
>> > > > > > > >> > > > >
>> > > > > > > >> > > > >>> > wrote:
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > > > > Hi,
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP for
>> > this.
>> > > > This
>> > > > > > is
>> > > > > > > >> super
>> > > > > > > >> > > > >>> important
>> > > > > > > >> > > > >>> > > for
>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > Few questions:
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as
>> > > > > possible"?
>> > > > > > > I'd
>> > > > > > > >> > want
>> > > > > > > >> > > to
>> > > > > > > >> > > > >>> > balance
>> > > > > > > >> > > > >>> > > > > between safety (more racks) and network
>> > > > > utilization
>> > > > > > > >> > (traffic
>> > > > > > > >> > > > >>> within a
>> > > > > > > >> > > > >>> > > > rack
>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One
>> > > replica
>> > > > > on
>> > > > > > a
>> > > > > > > >> > > different
>> > > > > > > >> > > > >>> rack
>> > > > > > > >> > > > >>> > > and
>> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible)
>> sounds
>> > > > better
>> > > > > to
>> > > > > > > me.
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly
>> complex
>> > > > > compared
>> > > > > > to
>> > > > > > > >> > > adding a
>> > > > > > > >> > > > >>> > > > rack.number
>> > > > > > > >> > > > >>> > > > > property to the broker properties file.
>> Why
>> > do
>> > > > we
>> > > > > > want
>> > > > > > > >> > that?
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > Gwen
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen
>> > Wang <
>> > > > > > > >> > > > >>> allenxw...@gmail.com>
>> > > > > > > >> > > > >>> > > > wrote:
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware
>> > replica
>> > > > > > > >> assignment.
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > >
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation
>> > > provided
>> > > > by
>> > > > > > the
>> > > > > > > >> > racks
>> > > > > > > >> > > in
>> > > > > > > >> > > > >>> data
>> > > > > > > >> > > > >>> > > > center
>> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks to
>> > provide
>> > > > > fault
>> > > > > > > >> > > tolerance.
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > > Thanks,
>> > > > > > > >> > > > >>> > > > > > Allen
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >
>> > > > > > > >> > > >
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Thanks,
>> Neha
>>
>
>

Reply via email to