Allen,

Thanks for the proposal. A few comments.

1. Since this KIP changes the inter broker communication protocol
(UpdateMetadataRequest), we will need to document the upgrade path (similar
to what's described in
http://kafka.apache.org/090/documentation.html#upgrade).

2. It might be useful to include the rack info of the broker in
TopicMetadataResponse. This can be useful for administrative tasks, as well
as read affinity in the future.

Jun



On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <allenxw...@gmail.com> wrote:

> If there are no more comments I would like to call for a vote.
>
>
> On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <allenxw...@gmail.com> wrote:
>
> > KIP is updated with more details and how to handle the situation where
> > rack information is incomplete.
> >
> > In the situation where rack information is incomplete, but we want to
> > continue with the assignment, I have suggested to ignore all rack
> > information and fallback to original algorithm. The reason is explained
> > below:
> >
> > The other options are to assume that the broker without the rack belong
> to
> > its own unique rack, or they belong to one "default" rack. Either way we
> > choose, it is highly likely to result in uneven number of brokers in
> racks,
> > and it is quite possible that the "made up" racks will have much fewer
> > number of brokers. As I explained in the KIP, uneven number of brokers in
> > racks will lead to uneven distribution of replicas among brokers (even
> > though the leader distribution is still even). The brokers in the rack
> that
> > has fewer number of brokers will get more replicas per broker than
> brokers
> > in other racks.
> >
> > Given this fact and the replica assignment produced will be incorrect
> > anyway from rack aware point of view, ignoring all rack information and
> > fallback to the original algorithm is not a bad choice since it will at
> > least have a better guarantee of replica distribution.
> >
> > Also for command line tools it gives user a choice if for any reason they
> > want to ignore rack information and fallback to the original algorithm.
> >
> >
> > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <allenxw...@gmail.com>
> wrote:
> >
> >> I am busy with some time pressing issues for the last few days. I will
> >> think about how the incomplete rack information will affect the balance
> and
> >> update the KIP by early next week.
> >>
> >> Thanks,
> >> Allen
> >>
> >>
> >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <n...@confluent.io>
> wrote:
> >>
> >>> Few suggestions on improving the KIP
> >>>
> >>> *If some brokers have rack, and some do not, the algorithm will thrown
> an
> >>> > exception. This is to prevent incorrect assignment caused by user
> >>> error.*
> >>>
> >>>
> >>> In the KIP, can you clearly state the user-facing behavior when some
> >>> brokers have rack information and some don't. Which actions and
> requests
> >>> will error out and how?
> >>>
> >>> *Even distribution of partition leadership among brokers*
> >>>
> >>>
> >>> There is some information about arranging the sorted broker list
> >>> interlaced
> >>> with rack ids. Can you describe the changes to the current algorithm
> in a
> >>> little more detail? How does this interlacing work if only a subset of
> >>> brokers have the rack id configured? Does this still work if uneven #
> of
> >>> brokers are assigned to each rack? It might work, I'm looking for more
> >>> details on the changes, since it will affect the behavior seen by the
> >>> user
> >>> - imbalance on either the leaders or data or both.
> >>>
> >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> aaurad...@linkedin.com>
> >>> wrote:
> >>>
> >>> > I think this sounds reasonable. Anyone else have comments?
> >>> >
> >>> > Aditya
> >>> >
> >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <allenxw...@gmail.com>
> >>> wrote:
> >>> >
> >>> > > During the discussion in the hangout, it was mentioned that it
> would
> >>> be
> >>> > > desirable that consumers know the rack information of the brokers
> so
> >>> that
> >>> > > they can consume from the broker in the same rack to reduce
> latency.
> >>> As I
> >>> > > understand this will only be beneficial if consumer can consume
> from
> >>> any
> >>> > > broker in ISR, which is not possible now.
> >>> > >
> >>> > > I suggest we skip the change to TMR. Once the change is made to
> >>> consumer
> >>> > to
> >>> > > be able to consume from any broker in ISR, the rack information can
> >>> be
> >>> > > added to TMR.
> >>> > >
> >>> > > Another thing I want to confirm is  command line behavior. I think
> >>> the
> >>> > > desirable default behavior is to fail fast on command line for
> >>> incomplete
> >>> > > rack mapping. The error message can include further instruction
> that
> >>> > tells
> >>> > > the user to add an extra argument (like "--allow-partial-rackinfo")
> >>> to
> >>> > > suppress the error and do an imperfect rack aware assignment. If
> the
> >>> > > default behavior is to allow incomplete mapping, the error can
> still
> >>> be
> >>> > > easily missed.
> >>> > >
> >>> > > The affected command line tools are TopicCommand and
> >>> > > ReassignPartitionsCommand.
> >>> > >
> >>> > > Thanks,
> >>> > > Allen
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> >>> > aaurad...@linkedin.com>
> >>> > > wrote:
> >>> > >
> >>> > > > Hi Allen,
> >>> > > >
> >>> > > > For TopicMetadataResponse to understand version, you can bump up
> >>> the
> >>> > > > request version itself. Based on the version of the request, the
> >>> > response
> >>> > > > can be appropriately serialized. It shouldn't be a huge change.
> For
> >>> > > > example: We went through something similar for ProduceRequest
> >>> recently
> >>> > (
> >>> > > > https://reviews.apache.org/r/33378/)
> >>> > > > I guess the reason protocol information is not included in the
> TMR
> >>> is
> >>> > > > because the topic itself is independent of any particular
> protocol
> >>> (SSL
> >>> > > vs
> >>> > > > Plaintext). Having said that, I'm not sure we even need rack
> >>> > information
> >>> > > in
> >>> > > > TMR. What usecase were you thinking of initially?
> >>> > > >
> >>> > > > For 1 - I'd be fine with adding an option to the command line
> tools
> >>> > that
> >>> > > > check rack assignment. For e.g. "--strict-assignment" or
> something
> >>> > > similar.
> >>> > > >
> >>> > > > Aditya
> >>> > > >
> >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> allenxw...@gmail.com>
> >>> > > wrote:
> >>> > > >
> >>> > > > > For 2 and 3, I have updated the KIP. Please take a look. One
> >>> thing I
> >>> > > have
> >>> > > > > changed is removing the proposal to add rack to
> >>> > TopicMetadataResponse.
> >>> > > > The
> >>> > > > > reason is that unlike UpdateMetadataRequest,
> >>> TopicMetadataResponse
> >>> > does
> >>> > > > not
> >>> > > > > understand version. I don't see a way to include rack without
> >>> > breaking
> >>> > > > old
> >>> > > > > version of clients. That's probably why secure protocol is not
> >>> > included
> >>> > > > in
> >>> > > > > the TopicMetadataResponse either. I think it will be a much
> >>> bigger
> >>> > > change
> >>> > > > > to include rack in TopicMetadataResponse.
> >>> > > > >
> >>> > > > > For 1, my concern is that doing rack aware assignment without
> >>> > complete
> >>> > > > > broker to rack mapping will result in assignment that is not
> rack
> >>> > aware
> >>> > > > and
> >>> > > > > fail to provide fault tolerance in the event of rack outage.
> This
> >>> > kind
> >>> > > of
> >>> > > > > problem will be difficult to surface. And the cost of this
> >>> problem is
> >>> > > > high:
> >>> > > > > you have to do partition reassignment if you are lucky to spot
> >>> the
> >>> > > > problem
> >>> > > > > early on or face the consequence of data loss during real rack
> >>> > outage.
> >>> > > > >
> >>> > > > > I do see the concern of fail-fast as it might also cause data
> >>> loss if
> >>> > > > > producer is not able produce the message due to topic creation
> >>> > failure.
> >>> > > > Is
> >>> > > > > it feasible to treat dynamic topic creation and command tools
> >>> > > > differently?
> >>> > > > > We allow dynamic topic creation with incomplete broker-rack
> >>> mapping
> >>> > and
> >>> > > > > fail fast in command line. Another option is to let user
> >>> determine
> >>> > the
> >>> > > > > behavior for command line. For example, by default fail fast in
> >>> > command
> >>> > > > > line but allow incomplete broker-rack mapping if another switch
> >>> is
> >>> > > > > provided.
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> >>> > > > > aaurad...@linkedin.com.invalid> wrote:
> >>> > > > >
> >>> > > > > > Hey Allen,
> >>> > > > > >
> >>> > > > > > 1. If we choose fail fast topic creation, we will have topic
> >>> > creation
> >>> > > > > > failures while upgrading the cluster. I really doubt we want
> >>> this
> >>> > > > > behavior.
> >>> > > > > > Ideally, this should be invisible to clients of a cluster.
> >>> > Currently,
> >>> > > > > each
> >>> > > > > > broker is effectively its own rack. So we probably can use
> the
> >>> rack
> >>> > > > > > information whenever possible but not make it a hard
> >>> requirement.
> >>> > To
> >>> > > > > extend
> >>> > > > > > Gwen's example, one badly configured broker should not
> degrade
> >>> > topic
> >>> > > > > > creation for the entire cluster.
> >>> > > > > >
> >>> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade
> >>> piece to
> >>> > > > > confirm
> >>> > > > > > that old clients will not see errors? I believe
> >>> > > > > ZookeeperConsumerConnector
> >>> > > > > > reads the Broker objects from ZK. I wanted to confirm that
> this
> >>> > will
> >>> > > > not
> >>> > > > > > cause any problems.
> >>> > > > > >
> >>> > > > > > 3. Could you elaborate your proposed changes to the
> >>> > > > UpdateMetadataRequest
> >>> > > > > > in the "Public Interfaces" section? Personally, I find this
> >>> format
> >>> > > easy
> >>> > > > > to
> >>> > > > > > read in terms of wire protocol changes:
> >>> > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> >>> > > > > >
> >>> > > > > > Aditya
> >>> > > > > >
> >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> >>> allenxw...@gmail.com>
> >>> > > > > wrote:
> >>> > > > > >
> >>> > > > > > > KIP is updated include rack as an optional property for
> >>> broker.
> >>> > > > Please
> >>> > > > > > take
> >>> > > > > > > a look and let me know if more details are needed.
> >>> > > > > > >
> >>> > > > > > > For the case where some brokers have rack and some do not,
> >>> the
> >>> > > > current
> >>> > > > > > KIP
> >>> > > > > > > uses the fail-fast behavior. If there are concerns, we can
> >>> > further
> >>> > > > > > discuss
> >>> > > > > > > this in the email thread or next hangout.
> >>> > > > > > >
> >>> > > > > > >
> >>> > > > > > >
> >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> >>> > allenxw...@gmail.com
> >>> > > >
> >>> > > > > > wrote:
> >>> > > > > > >
> >>> > > > > > > > That's a good question. I can think of three actions if
> the
> >>> > rack
> >>> > > > > > > > information is incomplete:
> >>> > > > > > > >
> >>> > > > > > > > 1. Treat the node without rack as if it is on its unique
> >>> rack
> >>> > > > > > > > 2. Disregard all rack information and fallback to current
> >>> > > algorithm
> >>> > > > > > > > 3. Fail-fast
> >>> > > > > > > >
> >>> > > > > > > > Now I think about it, one and three make more sense. The
> >>> reason
> >>> > > for
> >>> > > > > > > > fail-fast is that user mistake for not providing the rack
> >>> may
> >>> > > never
> >>> > > > > be
> >>> > > > > > > > found if we tolerate that and the assignment may not be
> >>> rack
> >>> > > aware
> >>> > > > as
> >>> > > > > > the
> >>> > > > > > > > user has expected and this creates debug problems when
> >>> things
> >>> > > fail.
> >>> > > > > > > >
> >>> > > > > > > > What do you think? If not fail-fast, is there anyway we
> can
> >>> > make
> >>> > > > the
> >>> > > > > > user
> >>> > > > > > > > error standing out?
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
> >>> > > g...@confluent.io>
> >>> > > > > > > wrote:
> >>> > > > > > > >
> >>> > > > > > > >> Thanks! Just to clarify, when some brokers have rack
> >>> > assignment
> >>> > > > and
> >>> > > > > > some
> >>> > > > > > > >> don't, do we act like none of them have it? or like
> those
> >>> > > without
> >>> > > > > > > >> assignment are in their own rack?
> >>> > > > > > > >>
> >>> > > > > > > >> The first scenario is good when first setting up
> >>> > rack-awareness,
> >>> > > > but
> >>> > > > > > the
> >>> > > > > > > >> second makes more sense for on-going maintenance (I can
> >>> > totally
> >>> > > > see
> >>> > > > > > > >> someone
> >>> > > > > > > >> adding a node and forgetting to set the rack property,
> we
> >>> > don't
> >>> > > > want
> >>> > > > > > > this
> >>> > > > > > > >> to change behavior for anything except the new node).
> >>> > > > > > > >>
> >>> > > > > > > >> What do you think?
> >>> > > > > > > >>
> >>> > > > > > > >> Gwen
> >>> > > > > > > >>
> >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> >>> > > > allenxw...@gmail.com>
> >>> > > > > > > >> wrote:
> >>> > > > > > > >>
> >>> > > > > > > >> > For scenario 1:
> >>> > > > > > > >> >
> >>> > > > > > > >> > - Add the rack information to broker property file or
> >>> > > > dynamically
> >>> > > > > > set
> >>> > > > > > > >> it in
> >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You would
> do
> >>> > that
> >>> > > > for
> >>> > > > > > all
> >>> > > > > > > >> > brokers and restart the brokers one by one.
> >>> > > > > > > >> >
> >>> > > > > > > >> > In this scenario, the complete broker to rack mapping
> >>> may
> >>> > not
> >>> > > be
> >>> > > > > > > >> available
> >>> > > > > > > >> > until every broker is restarted. During that time we
> >>> fall
> >>> > back
> >>> > > > to
> >>> > > > > > > >> default
> >>> > > > > > > >> > replica assignment algorithm.
> >>> > > > > > > >> >
> >>> > > > > > > >> > For scenario 2:
> >>> > > > > > > >> >
> >>> > > > > > > >> > - Add the rack information to broker property file or
> >>> > > > dynamically
> >>> > > > > > set
> >>> > > > > > > >> it in
> >>> > > > > > > >> > the wrapper code and start the broker.
> >>> > > > > > > >> >
> >>> > > > > > > >> >
> >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
> >>> > > > g...@confluent.io>
> >>> > > > > > > >> wrote:
> >>> > > > > > > >> >
> >>> > > > > > > >> > > Can you clarify the workflow for the following
> >>> scenarios:
> >>> > > > > > > >> > >
> >>> > > > > > > >> > > 1. I currently have 6 brokers and want to add rack
> >>> > > information
> >>> > > > > for
> >>> > > > > > > >> each
> >>> > > > > > > >> > > 2. I'm adding a new broker and I want to specify
> which
> >>> > rack
> >>> > > it
> >>> > > > > > > >> belongs on
> >>> > > > > > > >> > > while adding it.
> >>> > > > > > > >> > >
> >>> > > > > > > >> > > Thanks!
> >>> > > > > > > >> > >
> >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> >>> > > > > allenxw...@gmail.com
> >>> > > > > > >
> >>> > > > > > > >> > wrote:
> >>> > > > > > > >> > >
> >>> > > > > > > >> > > > We discussed the KIP in the hangout today. The
> >>> > > > recommendation
> >>> > > > > is
> >>> > > > > > > to
> >>> > > > > > > >> > make
> >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For users
> >>> with
> >>> > > > > existing
> >>> > > > > > > rack
> >>> > > > > > > >> > > > information stored somewhere, they would need to
> >>> > retrieve
> >>> > > > the
> >>> > > > > > > >> > information
> >>> > > > > > > >> > > > at broker start up and dynamically set the rack
> >>> > property,
> >>> > > > > which
> >>> > > > > > > can
> >>> > > > > > > >> be
> >>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker.
> There
> >>> will
> >>> > > be
> >>> > > > no
> >>> > > > > > > >> > interface
> >>> > > > > > > >> > > or
> >>> > > > > > > >> > > > pluggable implementation to retrieve the rack
> >>> > information.
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > The assumption is that you always need to restart
> >>> the
> >>> > > broker
> >>> > > > > to
> >>> > > > > > > >> make a
> >>> > > > > > > >> > > > change to the rack.
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > Once the rack becomes a broker property, it will
> be
> >>> > > possible
> >>> > > > > to
> >>> > > > > > > make
> >>> > > > > > > >> > rack
> >>> > > > > > > >> > > > part of the meta data to help the consumer choose
> >>> which
> >>> > in
> >>> > > > > sync
> >>> > > > > > > >> replica
> >>> > > > > > > >> > > to
> >>> > > > > > > >> > > > consume from as part of the future consumer
> >>> enhancement.
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > I will update the KIP.
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > Thanks,
> >>> > > > > > > >> > > > Allen
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> >>> > > > > > allenxw...@gmail.com>
> >>> > > > > > > >> > wrote:
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP
> was
> >>> not
> >>> > > > > > discussed
> >>> > > > > > > >> due
> >>> > > > > > > >> > to
> >>> > > > > > > >> > > > > time constraint.
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > > > However, after hearing discussion of KIP-35, I
> >>> have
> >>> > the
> >>> > > > > > feeling
> >>> > > > > > > >> that
> >>> > > > > > > >> > > > > incompatibility (caused by new broker property)
> >>> > between
> >>> > > > > > brokers
> >>> > > > > > > >> with
> >>> > > > > > > >> > > > > different versions  will be solved there. In
> >>> addition,
> >>> > > > > having
> >>> > > > > > > >> stack
> >>> > > > > > > >> > in
> >>> > > > > > > >> > > > > broker property as meta data may also help
> >>> consumers
> >>> > in
> >>> > > > the
> >>> > > > > > > >> future.
> >>> > > > > > > >> > So
> >>> > > > > > > >> > > I
> >>> > > > > > > >> > > > am
> >>> > > > > > > >> > > > > open to adding stack property to broker.
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP
> >>> hangout.
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> >>> > > > > > > allenxw...@gmail.com
> >>> > > > > > > >> >
> >>> > > > > > > >> > > > wrote:
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > > >> Can you send me the information on the next KIP
> >>> > > hangout?
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >> Currently the broker-rack mapping is not
> cached.
> >>> In
> >>> > > > > > KafkaApis,
> >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each time
> the
> >>> > > mapping
> >>> > > > > is
> >>> > > > > > > >> needed
> >>> > > > > > > >> > > for
> >>> > > > > > > >> > > > >> auto topic creation. This will ensure latest
> >>> mapping
> >>> > is
> >>> > > > > used
> >>> > > > > > at
> >>> > > > > > > >> any
> >>> > > > > > > >> > > > time.
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >> The ability to get the complete mapping makes
> it
> >>> > simple
> >>> > > > to
> >>> > > > > > > reuse
> >>> > > > > > > >> the
> >>> > > > > > > >> > > > same
> >>> > > > > > > >> > > > >> interface in command line tools.
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya
> >>> Auradkar <
> >>> > > > > > > >> > > > >> aaurad...@linkedin.com.invalid> wrote:
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP
> >>> hangout?
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> I do see that a pluggable rack locator can be
> >>> useful
> >>> > > > but I
> >>> > > > > > do
> >>> > > > > > > >> see a
> >>> > > > > > > >> > > few
> >>> > > > > > > >> > > > >>> concerns:
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> - The RackLocator (as described in the
> >>> document),
> >>> > > > implies
> >>> > > > > > that
> >>> > > > > > > >> it
> >>> > > > > > > >> > can
> >>> > > > > > > >> > > > >>> discover rack information for any node in the
> >>> > cluster.
> >>> > > > How
> >>> > > > > > > does
> >>> > > > > > > >> it
> >>> > > > > > > >> > > deal
> >>> > > > > > > >> > > > >>> with rack location changes? For example, if I
> >>> moved
> >>> > > > broker
> >>> > > > > > id
> >>> > > > > > > >> (1)
> >>> > > > > > > >> > > from
> >>> > > > > > > >> > > > >>> rack
> >>> > > > > > > >> > > > >>> X to Y, I only have to start that broker with
> a
> >>> > newer
> >>> > > > rack
> >>> > > > > > > >> config.
> >>> > > > > > > >> > If
> >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
> >>> information at
> >>> > > > start
> >>> > > > > up
> >>> > > > > > > >> time,
> >>> > > > > > > >> > > any
> >>> > > > > > > >> > > > >>> change to a broker will require bouncing the
> >>> entire
> >>> > > > > cluster
> >>> > > > > > > >> since
> >>> > > > > > > >> > > > >>> createTopic requests can be sent to any node
> in
> >>> the
> >>> > > > > cluster.
> >>> > > > > > > >> > > > >>> For this reason it may be simpler to have each
> >>> node
> >>> > be
> >>> > > > > aware
> >>> > > > > > > of
> >>> > > > > > > >> its
> >>> > > > > > > >> > > own
> >>> > > > > > > >> > > > >>> rack and persist it in ZK during start up
> time.
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an
> external
> >>> > > service
> >>> > > > > > being
> >>> > > > > > > >> > > available
> >>> > > > > > > >> > > > >>> to
> >>> > > > > > > >> > > > >>> serve rack information.
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple of
> >>> other
> >>> > > > > systems
> >>> > > > > > > deal
> >>> > > > > > > >> > with
> >>> > > > > > > >> > > > >>> zone/rack awareness.
> >>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
> >>> > > > > > > >> > > > >>> (Property File configuration)
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > >
> >>> > > > > > > >> >
> >>> > > > > > > >>
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> >>> > > > > > > >> > > > >>> (Dynamic inference)
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > >
> >>> > > > > > > >> >
> >>> > > > > > > >>
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone
> assignment
> >>> > based
> >>> > > on
> >>> > > > > > > >> > > configuration.
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> Aditya
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> >>> > > > > > > >> allenxw...@gmail.com
> >>> > > > > > > >> > >
> >>> > > > > > > >> > > > >>> wrote:
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> > I would like to see if we can do both:
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate
> >>> > migration
> >>> > > > > with
> >>> > > > > > > >> > existing
> >>> > > > > > > >> > > > >>> > broker-rack mapping
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > - Make rack an optional property for broker.
> >>> If
> >>> > rack
> >>> > > > is
> >>> > > > > > > >> available
> >>> > > > > > > >> > > > from
> >>> > > > > > > >> > > > >>> > broker, treat it as source of truth. For
> users
> >>> > with
> >>> > > > > > existing
> >>> > > > > > > >> > > > >>> broker-rack
> >>> > > > > > > >> > > > >>> > mapping somewhere else, they can use the
> >>> pluggable
> >>> > > way
> >>> > > > > or
> >>> > > > > > > they
> >>> > > > > > > >> > can
> >>> > > > > > > >> > > > >>> transfer
> >>> > > > > > > >> > > > >>> > the mapping to the broker rack property.
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > One thing I am not sure is what happens at
> >>> rolling
> >>> > > > > upgrade
> >>> > > > > > > >> when
> >>> > > > > > > >> > we
> >>> > > > > > > >> > > > have
> >>> > > > > > > >> > > > >>> > rack as a broker property. For brokers with
> >>> older
> >>> > > > > version
> >>> > > > > > of
> >>> > > > > > > >> > Kafka,
> >>> > > > > > > >> > > > >>> will it
> >>> > > > > > > >> > > > >>> > cause problem for them? If so, is there any
> >>> > > > workaround?
> >>> > > > > I
> >>> > > > > > > also
> >>> > > > > > > >> > > think
> >>> > > > > > > >> > > > it
> >>> > > > > > > >> > > > >>> > would be better not to have rack in the
> >>> controller
> >>> > > > wire
> >>> > > > > > > >> protocol
> >>> > > > > > > >> > > but
> >>> > > > > > > >> > > > >>> not
> >>> > > > > > > >> > > > >>> > sure if it is achievable.
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > Thanks,
> >>> > > > > > > >> > > > >>> > Allen
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd
> Palino <
> >>> > > > > > > >> tpal...@gmail.com>
> >>> > > > > > > >> > > > >>> wrote:
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable
> >>> locator.
> >>> > > For
> >>> > > > > > > >> example, we
> >>> > > > > > > >> > > > >>> already
> >>> > > > > > > >> > > > >>> > > have an interface for discovering
> >>> information
> >>> > > about
> >>> > > > > the
> >>> > > > > > > >> > physical
> >>> > > > > > > >> > > > >>> location
> >>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of
> >>> having to
> >>> > > > > > maintain
> >>> > > > > > > >> data
> >>> > > > > > > >> > in
> >>> > > > > > > >> > > > >>> > multiple
> >>> > > > > > > >> > > > >>> > > places.
> >>> > > > > > > >> > > > >>> > >
> >>> > > > > > > >> > > > >>> > > -Todd
> >>> > > > > > > >> > > > >>> > >
> >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
> >>> > Auradkar <
> >>> > > > > > > >> > > > >>> > > aaurad...@linkedin.com.invalid> wrote:
> >>> > > > > > > >> > > > >>> > >
> >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
> >>> RackLocator
> >>> > > class
> >>> > > > > that
> >>> > > > > > > is
> >>> > > > > > > >> > > > pluggable
> >>> > > > > > > >> > > > >>> > seems
> >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
> >>> > potentially
> >>> > > > > > non-ZK
> >>> > > > > > > >> > storage
> >>> > > > > > > >> > > > >>> for the
> >>> > > > > > > >> > > > >>> > > > rack info which I don't think is
> >>> necessary.
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in zk
> >>> under
> >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> >>> > > > > > > >> > > > >>> > > > similar to other broker properties and
> >>> add a
> >>> > > > config
> >>> > > > > in
> >>> > > > > > > >> > > > KafkaConfig
> >>> > > > > > > >> > > > >>> > called
> >>> > > > > > > >> > > > >>> > > > "rack".
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> >>> > > > > > > >> > > "rack":
> >>> > > > > > > >> > > > >>> > "abc"}
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > > > Aditya
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen
> >>> Shapira
> >>> > <
> >>> > > > > > > >> > > g...@confluent.io
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > > >>> > wrote:
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > > > > Hi,
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP
> for
> >>> > this.
> >>> > > > This
> >>> > > > > > is
> >>> > > > > > > >> super
> >>> > > > > > > >> > > > >>> important
> >>> > > > > > > >> > > > >>> > > for
> >>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > Few questions:
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks
> as
> >>> > > > > possible"?
> >>> > > > > > > I'd
> >>> > > > > > > >> > want
> >>> > > > > > > >> > > to
> >>> > > > > > > >> > > > >>> > balance
> >>> > > > > > > >> > > > >>> > > > > between safety (more racks) and
> network
> >>> > > > > utilization
> >>> > > > > > > >> > (traffic
> >>> > > > > > > >> > > > >>> within a
> >>> > > > > > > >> > > > >>> > > > rack
> >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch).
> One
> >>> > > replica
> >>> > > > > on
> >>> > > > > > a
> >>> > > > > > > >> > > different
> >>> > > > > > > >> > > > >>> rack
> >>> > > > > > > >> > > > >>> > > and
> >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible)
> >>> sounds
> >>> > > > better
> >>> > > > > to
> >>> > > > > > > me.
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly
> >>> complex
> >>> > > > > compared
> >>> > > > > > to
> >>> > > > > > > >> > > adding a
> >>> > > > > > > >> > > > >>> > > > rack.number
> >>> > > > > > > >> > > > >>> > > > > property to the broker properties
> file.
> >>> Why
> >>> > do
> >>> > > > we
> >>> > > > > > want
> >>> > > > > > > >> > that?
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > Gwen
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM,
> Allen
> >>> > Wang <
> >>> > > > > > > >> > > > >>> allenxw...@gmail.com>
> >>> > > > > > > >> > > > >>> > > > wrote:
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware
> >>> > replica
> >>> > > > > > > >> assignment.
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > >
> >>> > > > > > > >> >
> >>> > > > > > > >>
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation
> >>> > > provided
> >>> > > > by
> >>> > > > > > the
> >>> > > > > > > >> > racks
> >>> > > > > > > >> > > in
> >>> > > > > > > >> > > > >>> data
> >>> > > > > > > >> > > > >>> > > > center
> >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks to
> >>> > provide
> >>> > > > > fault
> >>> > > > > > > >> > > tolerance.
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > > Thanks,
> >>> > > > > > > >> > > > >>> > > > > > Allen
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > >
> >>> > > > > > > >> >
> >>> > > > > > > >>
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Neha
> >>>
> >>
> >>
> >
>

Reply via email to